1 Reply Latest reply on Jan 31, 2012 10:50 AM by MicahVillmow

    7970 - less codeLenInByte or NumSgprs


      Both values are checkable via KernelAnalyzer, so here is the question:


      Is it better to optimize an OpenCL kernel for lower NumSgprs usage or for shorter compiled kernel code? And how many codeBytes less would outperform the usage of an additional NumSgprs. Oh and it would be nice to get an explanation, what Sgprs stands for, I can only think of scalar general purpose registers, am I right?




        • Re: 7970 - less codeLenInByte or NumSgprs

          SGPR is scalar general purpose registers.


          In most cases, optimizing for lower NumSGPR's is optimal, however in some cases you can over optimize and cause a performance degradation. This is algorithmic and implementation dependent and not something that will always work in the general case. For example, if your algorithm implementation has good caching behavior, and you optimize to lower the GPR count, you can get worse performance by increasing the number of wavefronts in flight and then causing thrashing of the caches.