13 Replies Latest reply on Jun 2, 2010 2:23 AM by drstrip

    how do I tell which stream core I'm on?

    drstrip

      I would like to generate a separate stream of random numbers for each stream core on each SIMD engine. To do this, a thread needs to know which core and engine a it  is executing on. Alternatively, I would settle for being able to generate a global address per thread that is guarenteed conflict-free, but with the total number of globals being close to the total number of cores.

        • how do I tell which stream core I'm on?
          MicahVillmow
          Execute 1 group per SIMD and only execute enough groups to fit on all SIMD's.
            • how do I tell which stream core I'm on?
              drstrip

              With one workgroup per SIMD, then the SIMD ID is obviously the workgroup ID.  If the workgroup size is larger than the number of thread cores, is the core ID the workgroup size mod 16? ie, is it assured that thread 0, 16, 32, 48 are all executed on the same thread core (assuming 16 thread cores per SIMD)?

               

                • how do I tell which stream core I'm on?
                  Fr4nz

                   

                  Originally posted by: drstrip With one workgroup per SIMD, then the SIMD ID is obviously the workgroup ID.  If the workgroup size is larger than the number of thread cores, is the core ID the workgroup size mod 16? ie, is it assured that thread 0, 16, 32, 48 are all executed on the same thread core (assuming 16 thread cores per SIMD)?

                   

                   

                  Assuming that you proceed following the way shown by Micah, you should get the core ID through the function "get_group_id":

                  http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/get_group_id.html

                    • how do I tell which stream core I'm on?
                      drstrip

                       

                      Originally posted by: Fr4nz

                      Assuming that you proceed following the way shown by Micah, you should get the core ID through the function "get_group_id":

                      http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/get_group_id.html



                       

                      I'm familiar with the function and was planning to use that to identify the thread within the work group. However, it is my understanding that I should make the workgroup size a multiple of the number of cores for maximum efficiency. In that case, the return value of this function will span a range larger than the number of thread cores, which brings us back to my original question. Let's say I make the workgroup size 64 and have 16 stream cores per SIMD. Do work_items 0-15 execute together, 16-31 together, etc? Do work_items 0,16,32, 48 execute on the same core, while 1, 17, 33, 49 on another, etc.?

                        • how do I tell which stream core I'm on?
                          Fr4nz

                           

                          Originally posted by: drstrip

                          I'm familiar with the function and was planning to use that to identify the thread within the work group. However, it is my understanding that I should make the workgroup size a multiple of the number of cores for maximum efficiency. In that case, the return value of this function will span a range larger than the number of thread cores, which brings us back to my original question. Let's say I make the workgroup size 64 and have 16 stream cores per SIMD. Do work_items 0-15 execute together, 16-31 together, etc? Do work_items 0,16,32, 48 execute on the same core, while 1, 17, 33, 49 on another, etc.?



                          Correct. The important thing is to use a work-group size equivalent to the dimension of a wave-front (which is executed on a single SIMD engine and has a dimension of 64 threads on 5xxx): in this way you'll be sure of what you're doing.

                            • how do I tell which stream core I'm on?
                              drstrip

                               

                              Originally posted by: Fr4nz

                              Correct. The important thing is to use a work-group size equivalent to the dimension of a wave-front (which is executed on a single SIMD engine and has a dimension of 64 threads on 5xxx): in this way you'll be sure of what you're doing.



                               

                              And am I correct that the RV770 has wavefront size of 64 as do the new Cypress chips?

                               

                    • how do I tell which stream core I'm on?
                      MicahVillmow
                      drstrip,
                      Our high end chips have wavefront size 64 on both 7XX and 8XX, mid end chips are 32 and low end chips are 16.
                        • how do I tell which stream core I'm on?
                          drstrip

                          And what if I need more work_items than

                                  number_of_SIMD_engines * max_work_group size?

                           

                          Will work_group n be executed on

                                 SIMD engine (n mod number_of_SIMD_engines)     ?

                          e.g, for a 10 SIMD engine chipset,

                          workgroups 1, 11, 21, ... will execute on the same engine

                          workgroups 2, 12, 22, ... will execute on the same engine, etc

                          I haven't been able to come up with an experiment to test this conjecture, so if you have ideas ...

                           

                           

                           

                        • how do I tell which stream core I'm on?
                          MicahVillmow
                          Workgroups are scheduled on a SIMD in a round-robin fashion to the next free SIMD. I'll check with a couple of the other engineers to verify, but I believe it is as you say. The only other way is that it would place N workgroups onto a single SIMD until that SIMD was full, so it would look like this.
                          workgroup 1, 2, 21, 22
                          workgroups 3, 4, 23, 24

                          etc...

                          One of my assumptions is that all workgroups take the same time to execute.
                          • how do I tell which stream core I'm on?
                            MicahVillmow
                            Up to ~24.8 wavefronts can fit on a single SIMD depending on resource constraints, so it depends on the scheduling mode for how the SIMD's receive wavefronts.
                              • how do I tell which stream core I'm on?
                                bubu

                                Why you don't add a get_compute_unit_id() to the OpenCL 1.1 spec? That would be fantastic, specially for RNG and also for debugging !

                                • how do I tell which stream core I'm on?
                                  drstrip

                                   

                                  Originally posted by: MicahVillmow Up to ~24.8 wavefronts can fit on a single SIMD depending on resource constraints, so it depends on the scheduling mode for how the SIMD's receive wavefronts.


                                   

                                  In the case of assignment by round-robin, am I correct in interpreting your statement to say that more than one wavefront can be assigned to a SIMD at the same time? If so, is there anyway to predict how execution is interleaved among the wavefronts? If a thread in one wavefront does a read-op-write sequence to some global location based on it's SIMD and local_id, can this sequence conflict with another wavefront on the SIMD with the same local_id?

                                   

                                  In the case of the schedule till filled, we have the question above, plus the question of how do we tell how many work-groups have been assigned to the SIMD?

                                   

                                  As bubu writes, a get_compute_unit_id() function would be great, though it will almost certainly take a long time for this to appear in code, even if agreed on tomorrow. It also still requires answers to the questions above about conflicts between wavefronts assigned to the same SIMD.