The OpenCL compiler seems to be so "instable" in that a small change of the code may cause the use of VGPRs change "randomly".
Sometimes, the # of VGPRs increases after some code removed. It is unpredictable, and very annoying, especially when you are at the edge of KernelOccupancy. One more VGPRs decreases the efficiency dramatically.
I wonder if AMD provides a compilation option for optimization w.r.t the # of registers.
Thank you in advance.