cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Raistmer
Adept II

workitems distribution between SIMDs

how wavefronts will be formed?

Kernel executed on 32x10 execution domain (for example)
And GPU has 10 SIMDs (HD4870).
How threads (workitems) will be distributed between them?
Possible variants:
1) will be formed 10 wavefronts with 32 threads each, each SIMD will execute single wavefront, all SIMDs are busy.
2) 5 wavefronts with 64 threads in each will be formed, 5 SIMDs will be loaded with these wavefronts, another 5 SIMDs stay idle.

What variant (maybe some third?) will be realized?

And what if I set workgroup size to 32 as kernel call parameter, will it define variant 1?
0 Likes
2 Replies
omkaranathan
Adept I

Number of threads in a wavefront is  = min(64, work-group-size)

 

Hence in this case if group-size is 64 then case 2 will be executed – 5 wavefronts will be executed on 5 SIMDs rest will be idle

 

 If group-size is 32 then case 1 will be executed – 10 wavefronts will be executed on 10 SIMDs.



0 Likes

thanks, it means I can ensure needed packing via group size indeed.
For default assgnment it will be 10 wavefronts, not 5.
At least OpenCL profiler says so.
Execution domain is 32x10, 10 wavefronts formed.
0 Likes