good morning.
i currently use 64 constant uint in every thread of every workgroup in my kernel like so:
__constant uint * coeff = coeff_kernel_argument;
uint coeff1 = coeff[0];
...
uint coeff63 = coeff[63];
i am delighted to see that the 64 values end up in scalar registers (i am talking about Tahiti), as confirmed with the ISA dump.
now i would like to change it to:
uint * coeff = coeff_kernel_argument + get_global_id(0) / get_local_size(0) * 64;
unfortunately this results in the coeff?? variables to be stored in vgprs, even though their values are the same for each thread of a workgroup.
i am also beginning to rewrite my kernel in AMDIL for other reasons and as such a workaround that only works on the IL level would be suffice.
thanks a lot.
uint * coeff = coeff_kernel_argument + get_global_id(0) / get_local_size(0) * 64;
unfortunately this results in the coeff?? variables to be stored in vgprs, even though their values are the same for each thread of a workgroup.
You appear to be using the global thread id in your calculation which varies per thread which leads to the values being put in VGPRs.
the expression "get_global_id(0) / get_local_size(0)" is constant among all threads of a workgroup, which fits perfectly with the scalar registers being a shared resource per workgroup.
get_group_id(0) is constant for all threads in work group. get_global_id(0) returns unique number for all threads in all work groups.
"get_global_id(0) / get_local_size(0)" can't be evaluated on compilation stage, because its value depends on launch parameters. So, it's not a constant for compiler.
One workaround would be to:
__constant uint * coeff = coeff_kernel_argument;
uint coeff0 = coeff[0];
if (get_local_id{0) == 0)
{
shared_mem[0] = coeff0;
}
uint coeff1 = coeff[1];
if (get_local_id{0) == 0)
{
shared_mem[1] = coeff1;
}
...
uint coeff63 = coeff[63];
if (get_local_id{0) == 0)
{
shared_mem[63] = coeff63;
}
barrier(CLK_LOCAL_MEM_FENCE);
__local uint coeffForWorkGroup;
if (get_local_id(0) == 0)
{
coeffForWorkGroup = shared_mem[get_group_id(0)];
}
barrier(CLK_LOCAL_MEM_FENCE);
uint coeffForWorkGroupPrivateCopy = coeffForWorkGroup;