Hi,
I want to know the priority used by the compiler to distribute wavefronts to SIMD engine (CU).
Assume I have 20 wavefronts (reported by profiler in Visual Studio). HD 5870 has 20 cores.
Which one is correct:
Each SIMD engine get 1 wavefront or
1 SIMD engine get 4 wavefronts (so, 5 SIMD engines are used, the remaining 15 SIMD engines do nothing (idle).
----------
The reason I asked the question above:
I experienced two cases in my exeperiments (local work size is set to NULL).
Case 1:
If the total number of work items (global work size) is large, the number of wavefronts reported by profiler (after I do some math), I know that 1 wavefront is 64 work-items (full)
Case 2:
If the total number of work-items is not very large, the compiler chose only to half-fill the wavefront (1 wavefront is 32 work-items), so the number of wavefronts reported is large enough. It seems the compiler choose to have more number of wavefront (although it's half-filled/32) than less number of wavefront (full-filled/64). Is it correct?
I hope someone can help me with this question. I'm writing a school report, so I don't want to write wrong information in the report.
Thanks.