10 Replies Latest reply on Feb 24, 2009 2:20 PM by MicahVillmow

    Another Wavefront Question

    ryta1203

      Let's assume you are using 5 GPR/thread, so you can have 51 wavefronts active (256/5 = 51).

      When a wavefront is completed, does a new wavefront allocate resources and get put in the dispatcher? If so, how many wavefronts do you need above the actual executing amount to have it so that this is not noticed in performance.

      What I mean is, do wavefront batches (assuming a batch is how many can run in parallel, ie. dispatcher+executing) executing serially or is there some overlapping (a new wavefront is created as soon as an old wavefront is finished)?

        • Another Wavefront Question
          MicahVillmow

          Wavefronts are created as long as resources are available. If they cannot execute right away because other wavefronts are executing, they will be put on the run queue and wait for the executing wavefronts to stall. Once an executing wavefront stalls, a wavefront from the runqueue is starts executing.

            • Another Wavefront Question
              ryta1203

              Thanks.

              So if you can have 1024 threads max on a SIMD, that means you can have 16 wavefronts max on a SIMD, yes?

              So is there an advantage to having more than 16 wavefronts "running in parallel"?

              Also, as a side question, what is the max 3D stream you can have? I've noticed that some 3D streams won't execute even if x*y*z < 8192x8192.

                • Another Wavefront Question
                  gaurav.garg

                   

                  what is the max 3D stream you can have? I've noticed that some 3D streams won't execute even if x*y*z < 8192x8192.


                  If you are seeing any such behavior, that is a bug on Brook+ side. Could you post such dimensions where you see these errors? Also, could you check error on your declared 3D streams, do you see erros during stream allocation or some other operations (It might be possible that your card is out of memory)?

                    • Another Wavefront Question
                      ryta1203

                      In my main code (non-br file) I have an array "float4 trace[1001][1001][1]" and in my br file I have the same array as a stream "float4 trace_s<1001, 1001, 1>". This causes a crash (before you say it, I'd like to have a larger 3rd dimension but just used 1 as a test case).

                      I get a stack overflow error when the program begins. I'm using a 4850 512MB vid card.

                        • Another Wavefront Question
                          gaurav.garg

                          Was float4 trace[1001][1001][1] created on stack, or it was allocated dynamically?

                          stack overflow error means you are allocating too much memory on stack.

                            • Another Wavefront Question
                              ryta1203

                              gaurav,

                               You are correct, I have corrected this, I was just meaning something I noticed.

                              However, my original questions still stand:

                              So if you can have 1024 threads max on a SIMD, that means you can have 16 wavefronts max on a SIMD, yes?

                              So is there an advantage to having more than 16 wavefronts "running in parallel"?

                                • Another Wavefront Question
                                  ryta1203

                                   

                                  Originally posted by: ryta1203

                                  So if you can have 1024 threads max on a SIMD, that means you can have 16 wavefronts max on a SIMD, yes?

                                  So is there an advantage to having more than 16 wavefronts "running in parallel"?

                                  16 wavefronts is the max that each SIMD engine can run, since each can only run 1024.

                                  So what is the benefit to being able to have more than 16, in the sense that the resources (GPR) are available to do so, since a SIMD can't run more than this anyways?

                                   

                                    • Another Wavefront Question
                                      MicahVillmow

                                      Ryta,

                                       16 is the max number of wavefronts available in compute shader mode with LDS usage. In pixel shader mode this depends greatly on the kernel resource usage. 

                                       

                                      The more wavefronts executing in parallel, the more latency hiding you have. By increasing the number of wavefronts executing in parallel, it is possible to take a memory bound kernel and make it computation bound.

                                        • Another Wavefront Question
                                          ryta1203

                                          Micah,

                                            Is what mode Brook+ is in in the docs somewhere? I must have missed it, sorry.

                                            Also, so in pixel shader mode you can have an  unlimited number of wavefronts providing you have enough resources?

                                            • Another Wavefront Question
                                              MicahVillmow

                                              Brook+ runs currently in pixel shader mode. This can be seen from the first token of each il stream, il_ps_2_0, where as if they switch to compute shader mode it would be il_cs_2_0.

                                               

                                              Also, the driver can limit the number of resources in pixel shader mode because it is part of the graphics pipeline and must share resources with other shader modes. This constraint does not exist in compute shader mode.