3 Replies Latest reply on Nov 22, 2012 10:13 AM by alariq

    Access arbitrary element of a vector

    alariq

      Hello, All  I have a kernel where i have a uchar16 and i need to fetch specific element of this vector based on the local id.

      Suppose i have work group size = 64

      then i have next code: 

      uchar16 data = // read data from a buffer  

      // .... some code goes here

      // .... some code goes here

      // .... some code goes here

       

      int i = get_local_id(0)%16; // 0 <= i <= 15

      uchar16 new_data = ((uchar*)&data)[i];

       

      This of course works, but, suddenly, my kernel uses 132 scratch registers (this amount i can explain, compuler just puts "data"  in global memory because i have a dependent read from it, and "i" is not a compile time constant and i also use data in many other places).

       

      So i tried 2 other ways: 

       

      1) declare one additional array and copy data there

      uchar t[16];  t[0] = data.s0; ..... t[15] = data.s15;

      then do: new_data = t[i];

       

      but this even worse (in addition to 132 scratch regs. my kernel uses more VGPRS) 

      so i did this way

      2) use shuffle to get component and then splat it:

      uchar2 tt = shuffle(data, (uchar2)(i,0)); // i use uchar2 because function does not accept scalars

      new_data = (uchar16)(tt.x); 

      still same bad results (even more registers + 132 scratch regs.).

      However, if i use compile time constant in case 2) everything goes fine. I assume compiler just throws away function call. 

       

      Is there any way to do what i want in a good way with dynamic "i" without a big overhead? I can guarantee that it will be in [0,15] range :-) 

      Thanks

       

       

       

       

       

       

       

       

       

      Too long text!
      Détecter la langue » English
        • Re: Access arbitrary element of a vector
          nou

          There is limitation that HW don't know address register with dynamic index. So arrays which are accessed with dynamic index go to global memory aka scratch registers. You can move them to local memory which is much faster. Or try build binary search tree function from select() function.

          float4 v;
              int i;
           float b= select(select(v.x, v.y, i==0), select(v.z, v.w, i==2), i>=2);
          
          1 of 1 people found this helpful
            • Re: Access arbitrary element of a vector
              alariq

              Thanks, nou

              I was using local memory before, but was searching for better way, i'll try your bin. tree suggestion.

              Détecter la langue » English
              • Re: Access arbitrary element of a vector
                alariq

                One more question.

                Am i right assuming that compiler always puts my uchar16 var; into global memory if i have instruction of next form:

                var2 = shuffle(var, mask); // in case mask is not compile time constant

                ?

                Détecter la langue » English