Number of threads in a wavefront is = min(64, work-group-size)
Hence in this case if group-size is 64 then case 2 will be executed – 5 wavefronts will be executed on 5 SIMDs rest will be idle
If group-size is 32 then case 1 will be executed – 10 wavefronts will be executed on 10 SIMDs.
thanks, it means I can ensure needed packing via group size indeed.
For default assgnment it will be 10 wavefronts, not 5.
At least OpenCL profiler says so.
Execution domain is 32x10, 10 wavefronts formed.