I am attempting to reduce my VGPR usage in my kernel so that I may increase kernel occupancy. I am using CodeXL 1.5 to perform the analysis and profiling. I have managed to get my VGPR usage down from ~92 to ~48 in the analyzer, though when I run the profiler it is giving me a different number of VGPRs used (~58) than what the analyzer tells me. I have made sure that the optimization settings are exactly the same in each case. Are there any other clues as to what would explain the discrepancy between what the profiler and analyzer are reporting?
Is there a strategy for converting VGPR registers to SGPR registers? The kernel only uses ~22 SGPRs now, so it is apparent that the SGPRs so a better balance should yield better performance.
Finally, it would be nice to be able to turn off optimizations for small sections of code in order to reduce VGPR usage; does something like this exist?