Archives Discussions

Raistmer · ‎02-24-2011

smth like CUDA's one for ATi GPUs

Does smth like CUDA Occupancy calculator exist for ATi GPUs?

Did AMD ever bother to create smth like this or maybe some third-party development?

Jawed · ‎02-28-2011

See section 4.9.2 in the APP OpenCL Programming Guide.

The basic fact is that there are 16384 GPRs, which are split evenly amongst hardware threads.

As well as the headline GPR allocation reported by SKA the clause-temporary GPRs need to be accounted for. On Evergreen series there can be up to 8 of these per work item. Earlier GPUs support 4. Cayman doesn't seem to use them (not seen any in SKA), not sure though.

So 8 GPRs per work item * 64 work items per hardware thread * 2 hardware threads in the pipeline = 1024 registers of maximum overhead from clause-temporary registers.

So subtract the actual clause temporary overhead from 16384 first, then you can perfom the simple division that tells you how many hardware threads (wavefronts) you'll get.

The numbers in the table look like they account for clause temporary registers, but it seems to be a generalisation for kernels that use 4 temporaries.

Raistmer · ‎02-28-2011

nVidia's table accounts for not only registers. But also for shared memory, max possible numbers of blocks in flight and so on. Yes, almost all this info can be found in manual (for HD5xxx, not for HD4xx, btw), but nVidia's representation makes life easier, while AMD again and again "says" RTFM ...