1 of 1 people found this helpful
You could make the private array as volatile. Compiler will not touch it
But this is not an elegant way to control occupancy. Performance might not be portable across devices.
Using less private registers is always better, you get more occupancy.
Any specific reasons for using dummy private registers?
The volatile keyword works when I make scalar values into arrays. e.g.:
float theta = ...
volatile float theta;
theta = ...
In the example above, the compiler doesn't optimize out the unused registers which is the behavior I'm looking for. However, this only works for situations when data variables (such as theta) are referenced by the code.
Dead-code such as the example below:
volatile __private float occupancy_correction;
... is still optimized out by the compiler. Is there another way to achieve controlled occupancy execution? I'm profiling my kernel code in a set of distinct stages. The conditions (such as occupancy) of each stage must match the conditions when the full kernel is profiled.
It appears that I can control occupancy by focusing on local memory usage rather than registers. e.g.: the following dead-code:
__local float4 occupancy_correction;
will not be optimized out by the compiler.