cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

diapolo
Adept I

7970 - less codeLenInByte or NumSgprs

Both values are checkable via KernelAnalyzer, so here is the question:

Is it better to optimize an OpenCL kernel for lower NumSgprs usage or for shorter compiled kernel code? And how many codeBytes less would outperform the usage of an additional NumSgprs. Oh and it would be nice to get an explanation, what Sgprs stands for, I can only think of scalar general purpose registers, am I right?

Thanks,

Dia

0 Likes
1 Solution

SGPR is scalar general purpose registers.

In most cases, optimizing for lower NumSGPR's is optimal, however in some cases you can over optimize and cause a performance degradation. This is algorithmic and implementation dependent and not something that will always work in the general case. For example, if your algorithm implementation has good caching behavior, and you optimize to lower the GPR count, you can get worse performance by increasing the number of wavefronts in flight and then causing thrashing of the caches.

View solution in original post

0 Likes
1 Reply

SGPR is scalar general purpose registers.

In most cases, optimizing for lower NumSGPR's is optimal, however in some cases you can over optimize and cause a performance degradation. This is algorithmic and implementation dependent and not something that will always work in the general case. For example, if your algorithm implementation has good caching behavior, and you optimize to lower the GPR count, you can get worse performance by increasing the number of wavefronts in flight and then causing thrashing of the caches.

0 Likes