i got a AMD 7970 and noticed for the occupancy value a hardware limitation for 40 wavefronts. My question is, are the 40 wavefronts per compute unit? Otherwise i can't image how the 40 wavefronts are distributed to 32 CUs. Also i thought 40 wavefronts is a really small amount for the whole GPU to be able to hide the latencies in a good way.
Oh i see. I read, that one SIMD unit, where 4 units are inside one compute unit, can hold up to 10 wavefronts. That means, that the occupancy value of 40 wavefronts are per CU. This looks really great, becaus it is resolving into 40*32*64=81920 threads, which can hold the whole 7970 GPU. That is amazing.