Hi everyone,
I'm writing a few kernels which use a larger number of VGPRs than are required. I am looking to hand-tune the code, while remaining in C. Unfortunately, I find that more registers are used than I anticipate, despite my best efforts to write C code that should efficiently map to assembly.
For example, if I write something like this:
void do_something(int* x, int* y, int* tmp0, int* tmp1, int* result)
{
*tmp0 = *x + *y;
*tmp1 = *tmp0 + *x;
*result = *tmp1;
}
I would anticipate that I would be able to control register usage quite well. The main kernel body would allocate the temporary values, and just use pointers to call the function. This doesn't always work as you would expect, and I'm not sure why. Is there a good way to have complete control over VGPR use without resorting to lower-level code? Any general guidelines on reducing register use?