Archives Discussions

Raistmer · ‎05-01-2010

how wavefronts will be formed?

Kernel executed on 32x10 execution domain (for example)
And GPU has 10 SIMDs (HD4870).
How threads (workitems) will be distributed between them?
Possible variants:
1) will be formed 10 wavefronts with 32 threads each, each SIMD will execute single wavefront, all SIMDs are busy.
2) 5 wavefronts with 64 threads in each will be formed, 5 SIMDs will be loaded with these wavefronts, another 5 SIMDs stay idle.

What variant (maybe some third?) will be realized?

And what if I set workgroup size to 32 as kernel call parameter, will it define variant 1?

omkaranathan · ‎05-03-2010

Number of threads in a wavefront is = min(64, work-group-size)

Hence in this case if group-size is 64 then case 2 will be executed – 5 wavefronts will be executed on 5 SIMDs rest will be idle

If group-size is 32 then case 1 will be executed – 10 wavefronts will be executed on 10 SIMDs.

Raistmer · ‎05-03-2010

thanks, it means I can ensure needed packing via group size indeed.
For default assgnment it will be 10 wavefronts, not 5.
At least OpenCL profiler says so.
Execution domain is 32x10, 10 wavefronts formed.

Archives Discussions

workitems distribution between SIMDs