Archives Discussions

diapolo · ‎01-31-2012

Both values are checkable via KernelAnalyzer, so here is the question:

Is it better to optimize an OpenCL kernel for lower NumSgprs usage or for shorter compiled kernel code? And how many codeBytes less would outperform the usage of an additional NumSgprs. Oh and it would be nice to get an explanation, what Sgprs stands for, I can only think of scalar general purpose registers, am I right?

Thanks,

Dia

MicahVillmow · ‎01-31-2012

SGPR is scalar general purpose registers.

In most cases, optimizing for lower NumSGPR's is optimal, however in some cases you can over optimize and cause a performance degradation. This is algorithmic and implementation dependent and not something that will always work in the general case. For example, if your algorithm implementation has good caching behavior, and you optimize to lower the GPR count, you can get worse performance by increasing the number of wavefronts in flight and then causing thrashing of the caches.

View solution in original post

MicahVillmow · ‎01-31-2012

SGPR is scalar general purpose registers.

In most cases, optimizing for lower NumSGPR's is optimal, however in some cases you can over optimize and cause a performance degradation. This is algorithmic and implementation dependent and not something that will always work in the general case. For example, if your algorithm implementation has good caching behavior, and you optimize to lower the GPR count, you can get worse performance by increasing the number of wavefronts in flight and then causing thrashing of the caches.

Archives Discussions

7970 - less codeLenInByte or NumSgprs