Currently, AMD OpenCL compiler doesn't provide any compiler flag that directly controls the VGPR usage. However, below are some suggestions that may influence the compiler to optimize the register usage. Some experiments are required to see the effects and choose the best options.
- setting a proper workgroup size at compilation time using reqd_work_group_size
- loop unrolling (#pragma unroll) - large unroll factor may increase the register usage
- for trivial or low cost computations, sometimes recomputing the results is more efficient than storing in the variables because it can reduce the register usage
- for vector types, use the private variables carefully because it may greatly increase the register usage (by factor of the vector length)
- try to avoid long lived register variables (i.e. where register contents must be preserved for long duration) - it increases the live register counter and also the total register counter
[ Note: "CodeXLAnalyzer Command Line Interface" in CodeXL can be used to generate the live register analysis report which shows register usage of an OpenCL kernel throughout its execution (at ISA level). For more information, please check CodeXL user guide. ]
Thanks.