how wavefronts will be formed?
Kernel executed on 32x10 execution domain (for example)
And GPU has 10 SIMDs (HD4870).
How threads (workitems) will be distributed between them?
Possible variants:
1) will be formed 10 wavefronts with 32 threads each, each SIMD will execute single wavefront, all SIMDs are busy.
2) 5 wavefronts with 64 threads in each will be formed, 5 SIMDs will be loaded with these wavefronts, another 5 SIMDs stay idle.
What variant (maybe some third?) will be realized?
And what if I set workgroup size to 32 as kernel call parameter, will it define variant 1?