cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

alariq
Adept I

Access arbitrary element of a vector

Hello, All  I have a kernel where i have a uchar16 and i need to fetch specific element of this vector based on the local id.

Suppose i have work group size = 64

then i have next code: 

uchar16 data = // read data from a buffer  

// .... some code goes here

// .... some code goes here

// .... some code goes here

int i = get_local_id(0)%16; // 0 <= i <= 15

uchar16 new_data = ((uchar*)&data);

This of course works, but, suddenly, my kernel uses 132 scratch registers (this amount i can explain, compuler just puts "data"  in global memory because i have a dependent read from it, and "i" is not a compile time constant and i also use data in many other places).

So i tried 2 other ways: 

1) declare one additional array and copy data there

uchar t[16];  t[0] = data.s0; ..... t[15] = data.s15;

then do: new_data = t;

but this even worse (in addition to 132 scratch regs. my kernel uses more VGPRS) 

so i did this way

2) use shuffle to get component and then splat it:

uchar2 tt = shuffle(data, (uchar2)(i,0)); // i use uchar2 because function does not accept scalars

new_data = (uchar16)(tt.x); 

still same bad results (even more registers + 132 scratch regs.).

However, if i use compile time constant in case 2) everything goes fine. I assume compiler just throws away function call. 

Is there any way to do what i want in a good way with dynamic "i" without a big overhead? I can guarantee that it will be in [0,15] range 🙂 

Thanks

Too long text!
Détecter la langue » English
0 Likes
3 Replies
nou
Exemplar

There is limitation that HW don't know address register with dynamic index. So arrays which are accessed with dynamic index go to global memory aka scratch registers. You can move them to local memory which is much faster. Or try build binary search tree function from select() function.

float4 v;

    int i;

float b= select(select(v.x, v.y, i==0), select(v.z, v.w, i==2), i>=2);

Thanks, nou

I was using local memory before, but was searching for better way, i'll try your bin. tree suggestion.

Détecter la langue » English
0 Likes

One more question.

Am i right assuming that compiler always puts my uchar16 var; into global memory if i have instruction of next form:

var2 = shuffle(var, mask); // in case mask is not compile time constant

?

Détecter la langue » English
0 Likes