Hello.
From manual:
"
For example, the 128 threads in a 2D work-group of
dimension 32x4 (X=32 and Y=4) would be packed into two wavefronts as follows
(notation shown in X,Y order):
"
(see table on page 4-37)
But I don't see such packing over Y-dimension in my kernel.
If X-dim has only 32 point and Y has 12 points I see 12 wavefronts, not 6 as should be if packing along Y-dim would work as described in manual.
Example from profiler:
Method ExecutionOrder GlobalWorkSize GroupWorkSize Time LDSSize DataTransferSize GPRs ScratchRegs FCStacks Wavefronts ALUInsts FetchInsts WriteInsts ALUBusy ALUFetchRatio ALUPacking FetchSize CacheHit FetchUnitBusy FetchUnitStalled WriteUnitStalled
PC_single_pulse_kernel_FFA_update_vectorised_05AA3CE0 14 { 32 12 1} NULL 2,90835 0 52 0 3 12,00 25736,00 4371,00 849,00 5,85 5,89 84,88 26330,25 0,00 25,95 19,33 0,00
Why profiler shows 12 wavefronts and not 6 ?
Is it problem with profiler, bug in manual or jsut some other factor (what?) take place ?