Hello, All I have a kernel where i have a uchar16 and i need to fetch specific element of this vector based on the local id.
Suppose i have work group size = 64
then i have next code:
uchar16 data = // read data from a buffer
// .... some code goes here
// .... some code goes here
// .... some code goes here
int i = get_local_id(0)%16; // 0 <= i <= 15
uchar16 new_data = ((uchar*)&data);
This of course works, but, suddenly, my kernel uses 132 scratch registers (this amount i can explain, compuler just puts "data" in global memory because i have a dependent read from it, and "i" is not a compile time constant and i also use data in many other places).
So i tried 2 other ways:
1) declare one additional array and copy data there
There is limitation that HW don't know address register with dynamic index. So arrays which are accessed with dynamic index go to global memory aka scratch registers. You can move them to local memory which is much faster. Or try build binary search tree function from select() function.