Showing results for 
Search instead for 
Did you mean: 


Adept I

Any way to avoid using too many VGPRs?

Is there anything like cuda's "register" keyword hinting the compiler to store the value of one variable in one register, instead of using many registers for storing its temporary value? I tried "volatile" but sometimes it doesn't work.

Or is there any compile option for opencl to use like cuda's "-maxrregcount"?

Thanks in advance.

1 Reply
Big Boss

Currently, AMD OpenCL compiler doesn't provide any compiler flag that directly controls the VGPR usage. However, below are some suggestions that may influence the compiler to optimize the register usage. Some experiments are required to see the effects and choose the best options.

  • setting a proper workgroup size at compilation time using reqd_work_group_size
  • loop unrolling (#pragma unroll) - large unroll factor may increase the register usage
  • for trivial or low cost computations, sometimes recomputing the results is more efficient than storing in the variables because it can reduce the register usage
  • for vector types, use the private variables carefully because it may greatly increase the register usage (by factor of the vector length)
  • try to avoid long lived register variables (i.e. where register contents must be preserved for long duration)  - it increases the live register counter and also the total register counter

       [ Note: "CodeXLAnalyzer Command Line Interface" in CodeXL can be used to generate the live register analysis report which shows register usage of an OpenCL kernel throughout its execution (at ISA level). For more information, please check CodeXL user guide. ]