Is there anything like cuda's "register" keyword hinting the compiler to store the value of one variable in one register, instead of using many registers for storing its temporary value? I tried "volatile" but sometimes it doesn't work.
Or is there any compile option for opencl to use like cuda's "-maxrregcount"?
Thanks in advance.
Currently, AMD OpenCL compiler doesn't provide any compiler flag that directly controls the VGPR usage. However, below are some suggestions that may influence the compiler to optimize the register usage. Some experiments are required to see the effects and choose the best options.
[ Note: "CodeXLAnalyzer Command Line Interface" in CodeXL can be used to generate the live register analysis report which shows register usage of an OpenCL kernel throughout its execution (at ISA level). For more information, please check CodeXL user guide. ]
Thanks.