cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

erman_amd
Journeyman III

wavefront distribution

Hi,

I want to know the priority used by the compiler to distribute wavefronts to SIMD engine (CU).

Assume I have 20 wavefronts (reported by profiler in Visual Studio). HD 5870 has 20 cores.

Which one is correct:

Each SIMD engine get 1 wavefront or

1 SIMD engine get 4 wavefronts (so, 5 SIMD engines are used, the remaining 15 SIMD engines do nothing (idle).

----------

The reason I asked the question above:

I experienced two cases in my exeperiments (local work size is set to NULL).

Case 1:

If the total number of work items (global work size) is large, the number of wavefronts reported by profiler (after I do some math), I know that 1 wavefront is 64 work-items (full)

Case 2:

If the total number of work-items is not very large, the compiler chose only to half-fill the wavefront (1 wavefront is 32 work-items), so the number of wavefronts reported is large enough. It seems the compiler choose to have more number of wavefront (although it's half-filled/32) than less number of wavefront (full-filled/64). Is it correct?

I hope someone can help me with this question. I'm writing a school report, so I don't want to write wrong information in the report.

Thanks.

0 Likes
1 Reply
maximmoroz
Journeyman III

I just note that it is not compiler but AMD APP Runtime (Catalyst driver) that defines local worksize in case none is specified in enqeueNDRange.

0 Likes