Optimizing code for GCN

Question asked by rwelsch on May 11, 2012



I have some basic questions concerning GCN and optimizing my code for this architecture.


As far as I understand there is a limit of 10 wave-fronts per SIMD unit and a Compute Unit is composed of 4 SIMD units; so I can run 40 wave-fronts on one compute unit in parallel, right?

And there is also a limit of 16 work-groups per Compute Unit, if the work-groups size is greater than 1 wave-front (64 work items). Does that mean, that I can run 40 work-groups on one Compute Unit if my local work group size is 64 (= 1 wave-front)?

The other limiting resources are then 25 registers per work item (64kB of registers for 10 wave-fronts) and 1.6 kB of local memory (64 kB for 40 work-groups). Is this correct, or do I miss something?


Thanks for all the help, I'm just getting started with GPU computing and OpenCl, etc.