Outsmarting the OpenCL Compiler on AMD GPUs

Question asked by carlodelmundo on Jan 21, 2013
I am developing an application on OpenCL on an AMD 6970 using AMDAPPSDK v2.7. 


I need to be able to control the occupancy of workgroups without introducing overhead.  For example, if I declare the following:


__private float occupancy_correction[20];


I want the OpenCL compiler to leave it alone and allocate the necessary registers when the kernel is launched.  I've noticed, however, that since it is dead-code, the compiler will optimize it out.


Is it possible to trick the compiler into unoptimizing the code and using more registers than necessary?