Reducing Register Usage

Discussion created by aj_guillon on Oct 31, 2010
Latest reply on Nov 14, 2010 by himanshu.gautam

The register usage for one of my main kernels seems rather high, in one of my other posts you can see it is 90 GPRs at the moment.  I'm not sure what I can do to help the compiler reduce that usage (I have already set an explicit work-group size of 64 as suggested in the guide), because I do expect the compiler to do its own optimizations to recycle registers as quickly as possible without me having to worry about it too much.

Since functions are inlined anyways, I would expect that if I did something like:

float calc_some(float x, float y, float z){ };

That the x,y,z floats will never really use registers... instead the compiler will see that it can optimize that away, and substitute whatever I passed as an argument (already in a register, i.e. no extra registers used).  I would furthermore expect if I do something like:

typedef struct { float a; float b; float c; } bundle_results_t;

bundle_results_t some_calcs(float x, floaty, float z) { }

where I return a whole bunch of results... the compiler should recognize it can kinda collapse this expression... and rather than storing the results, it can see when I use them and just substitute an expression in its place (especially since these are inlined).

Furthermore, I would expect chains of these types of things to be handled optimally... i.e. inlined functions calling inlined functions should just collapse to be rather simple (but long) expressions.

Please let me know if what I've described is currently done by the compiler, and how much I can really reduce my register usage.  It would be very handy if the OpenCL kernel compiler were to annotate my code, so that I can see exactly where my register usage comes from.  Any other hints are greatly appreciated!