wavefront distribution

Discussion created by erman_amd on Jun 30, 2011
Latest reply on Jun 30, 2011 by maximmoroz


I want to know the priority used by the compiler to distribute wavefronts to SIMD engine (CU).

Assume I have 20 wavefronts (reported by profiler in Visual Studio). HD 5870 has 20 cores.

Which one is correct:

Each SIMD engine get 1 wavefront or

1 SIMD engine get 4 wavefronts (so, 5 SIMD engines are used, the remaining 15 SIMD engines do nothing (idle).


The reason I asked the question above:

I experienced two cases in my exeperiments (local work size is set to NULL).

Case 1:

If the total number of work items (global work size) is large, the number of wavefronts reported by profiler (after I do some math), I know that 1 wavefront is 64 work-items (full)

Case 2:

If the total number of work-items is not very large, the compiler chose only to half-fill the wavefront (1 wavefront is 32 work-items), so the number of wavefronts reported is large enough. It seems the compiler choose to have more number of wavefront (although it's half-filled/32) than less number of wavefront (full-filled/64). Is it correct?

I hope someone can help me with this question. I'm writing a school report, so I don't want to write wrong information in the report.