I used a Radeon 5850 for OpenCL development and now upgraded to a Radeon R9 280.
Using CodeXL I profiled the same kernel on the 5850 and R9 280 and I got 100% occupancy on the 5850 and 40% on the 280. CodeXL shows 7 VGPRs used on the 5850 and 52 VGPRs + 28 SGPRs used on the 280.
So my question is basically: WTF? First, is the notion of VGPR's different between Cypress and Tahiti? And if not, why is the code using so many more registers on Tahiti?
By the way, the kernel does run twice as fast on the 280 compared to the 5850 (according to CodeXL's mesurement), which is about what I expected, but I still find this measurement strange and I'm not sure how to deal with it.
I'm not very at home with series 5000... but it is my understanding it was/is VLIW5.
Registers there are 5D to my understanding, so they are more akin to 35 registers in GCN.
VGPRs are not vector registers as they were. They are scalar 32bit registers private to each ALU in a SIMD lane.
Branching also increases register usage, as well as some optimizations to lower pressure, without having an idea of your kernel, it's hard to say more.