I am developing an application on OpenCL on an AMD 6970 using AMDAPPSDK v2.7.
I need to be able to control the occupancy of workgroups without introducing overhead. For example, if I declare the following:
__private float occupancy_correction;
I want the OpenCL compiler to leave it alone and allocate the necessary registers when the kernel is launched. I've noticed, however, that since it is dead-code, the compiler will optimize it out.
Is it possible to trick the compiler into unoptimizing the code and using more registers than necessary?