cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

drstrip
Journeyman III

how do I tell which stream core I'm on?

I would like to generate a separate stream of random numbers for each stream core on each SIMD engine. To do this, a thread needs to know which core and engine a it  is executing on. Alternatively, I would settle for being able to generate a global address per thread that is guarenteed conflict-free, but with the total number of globals being close to the total number of cores.

0 Likes
13 Replies

Execute 1 group per SIMD and only execute enough groups to fit on all SIMD's.
0 Likes

With one workgroup per SIMD, then the SIMD ID is obviously the workgroup ID.  If the workgroup size is larger than the number of thread cores, is the core ID the workgroup size mod 16? ie, is it assured that thread 0, 16, 32, 48 are all executed on the same thread core (assuming 16 thread cores per SIMD)?

 

0 Likes

Originally posted by: drstrip With one workgroup per SIMD, then the SIMD ID is obviously the workgroup ID.  If the workgroup size is larger than the number of thread cores, is the core ID the workgroup size mod 16? ie, is it assured that thread 0, 16, 32, 48 are all executed on the same thread core (assuming 16 thread cores per SIMD)?

 

 

Assuming that you proceed following the way shown by Micah, you should get the core ID through the function "get_group_id":

http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/get_group_id.html

0 Likes

Originally posted by: Fr4nz

Assuming that you proceed following the way shown by Micah, you should get the core ID through the function "get_group_id":

http://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/get_group_id.html



 

I'm familiar with the function and was planning to use that to identify the thread within the work group. However, it is my understanding that I should make the workgroup size a multiple of the number of cores for maximum efficiency. In that case, the return value of this function will span a range larger than the number of thread cores, which brings us back to my original question. Let's say I make the workgroup size 64 and have 16 stream cores per SIMD. Do work_items 0-15 execute together, 16-31 together, etc? Do work_items 0,16,32, 48 execute on the same core, while 1, 17, 33, 49 on another, etc.?

0 Likes

Originally posted by: drstrip

I'm familiar with the function and was planning to use that to identify the thread within the work group. However, it is my understanding that I should make the workgroup size a multiple of the number of cores for maximum efficiency. In that case, the return value of this function will span a range larger than the number of thread cores, which brings us back to my original question. Let's say I make the workgroup size 64 and have 16 stream cores per SIMD. Do work_items 0-15 execute together, 16-31 together, etc? Do work_items 0,16,32, 48 execute on the same core, while 1, 17, 33, 49 on another, etc.?



Correct. The important thing is to use a work-group size equivalent to the dimension of a wave-front (which is executed on a single SIMD engine and has a dimension of 64 threads on 5xxx): in this way you'll be sure of what you're doing.

0 Likes

Originally posted by: Fr4nz

Correct. The important thing is to use a work-group size equivalent to the dimension of a wave-front (which is executed on a single SIMD engine and has a dimension of 64 threads on 5xxx): in this way you'll be sure of what you're doing.



 

And am I correct that the RV770 has wavefront size of 64 as do the new Cypress chips?

 

0 Likes

drstrip,
Our high end chips have wavefront size 64 on both 7XX and 8XX, mid end chips are 32 and low end chips are 16.
0 Likes

And what if I need more work_items than

        number_of_SIMD_engines * max_work_group size?

 

Will work_group n be executed on

       SIMD engine (n mod number_of_SIMD_engines)     ?

e.g, for a 10 SIMD engine chipset,

workgroups 1, 11, 21, ... will execute on the same engine

workgroups 2, 12, 22, ... will execute on the same engine, etc

I haven't been able to come up with an experiment to test this conjecture, so if you have ideas ...

 

 

 

0 Likes

Workgroups are scheduled on a SIMD in a round-robin fashion to the next free SIMD. I'll check with a couple of the other engineers to verify, but I believe it is as you say. The only other way is that it would place N workgroups onto a single SIMD until that SIMD was full, so it would look like this.
workgroup 1, 2, 21, 22
workgroups 3, 4, 23, 24

etc...

One of my assumptions is that all workgroups take the same time to execute.
0 Likes

in the context of the code I'm working on, the kernel has no branches, so each execution should take the same time, modulo memory contention.

Also, my workgroup size is equal to wavefront size, so presumably that means a workgroup would "fill" a SIMD, right?

0 Likes

Up to ~24.8 wavefronts can fit on a single SIMD depending on resource constraints, so it depends on the scheduling mode for how the SIMD's receive wavefronts.
0 Likes

Why you don't add a get_compute_unit_id() to the OpenCL 1.1 spec? That would be fantastic, specially for RNG and also for debugging !

0 Likes

Originally posted by: MicahVillmow Up to ~24.8 wavefronts can fit on a single SIMD depending on resource constraints, so it depends on the scheduling mode for how the SIMD's receive wavefronts.


 

In the case of assignment by round-robin, am I correct in interpreting your statement to say that more than one wavefront can be assigned to a SIMD at the same time? If so, is there anyway to predict how execution is interleaved among the wavefronts? If a thread in one wavefront does a read-op-write sequence to some global location based on it's SIMD and local_id, can this sequence conflict with another wavefront on the SIMD with the same local_id?

 

In the case of the schedule till filled, we have the question above, plus the question of how do we tell how many work-groups have been assigned to the SIMD?

 

As bubu writes, a get_compute_unit_id() function would be great, though it will almost certainly take a long time for this to appear in code, even if agreed on tomorrow. It also still requires answers to the questions above about conflicts between wavefronts assigned to the same SIMD.

 

 

0 Likes