Both values are checkable via KernelAnalyzer, so here is the question:


Is it better to optimize an OpenCL kernel for lower NumSgprs usage or for shorter compiled kernel code? And how many codeBytes less would outperform the usage of an additional NumSgprs. Oh and it would be nice to get an explanation, what Sgprs stands for, I can only think of scalar general purpose registers, am I right?