I'm writing a few kernels which use a larger number of VGPRs than are required. I am looking to hand-tune the code, while remaining in C. Unfortunately, I find that more registers are used than I anticipate, despite my best efforts to write C code that should efficiently map to assembly.
For example, if I write something like this:
void do_something(int* x, int* y, int* tmp0, int* tmp1, int* result)
*tmp0 = *x + *y;
*tmp1 = *tmp0 + *x;
*result = *tmp1;
I would anticipate that I would be able to control register usage quite well. The main kernel body would allocate the temporary values, and just use pointers to call the function. This doesn't always work as you would expect, and I'm not sure why. Is there a good way to have complete control over VGPR use without resorting to lower-level code? Any general guidelines on reducing register use?
A few things i can think of are:
1. Try to create explicit scopes in your kernel, by using additional curly braces. This may reduce the number of VGPRs required at any point, by making variables out of scope, as soon as they are not required.
2. If possible, try to reuse the variables, and make the number of variables minimum.
3. Check if some variables can be made be const. They will probably get optimized away.
4. You can even think of having some variables in LDS, incase your register pressure is very high. Keep in mind the LDs throughput and bank conflict issues though.
5. You can check out the opmization options available while building kernel, they may also help in reducing VGPRs.
Probably Some one can add more.