how to prevent usage of VGPRs for storing intermediate data, that compiler find to be useful in future?
In my program each kernels stores 8x8 pixel(8bit) data to Local memory; pixels are packed into uchar4 by 4.
position of quadropixel is array_in_LDS[get_local_size(0)*qpixel_num + get_local_id(0)]
At some time i place them in local memory and after some time load from local memory.
between these two times compiler stores each index in VGPR => 16 vgprs are used in this example
There is a workaround:
if at store time calculate index differently, for example mad24(get_local_size(0), qpixel_num, get_local_id(0)),
then no additional VGPRs are used
Calculating index is ~5 ticks or even 1 with mad24, while VGPRs are always expensive.
Are there any solutions to prevent storing such easy calculated data without fooling compiler? Maybe some directive or some option to clBuildProgram?