3 Replies Latest reply on Feb 27, 2014 7:35 AM by realhet

    constant array packing

    digrobot

      Hello!

      In my kernel I am using constant array of bytes:

      __constant uchar myconst[256]  = {  0x5a, 0xe6, 0x61, 0xf4, 0x31, 0xe3, 0x85 ....................skipped values here...............};


      But when I look at generated IL Assembly code,  it seems that  8-bit array values are packed to 128-bit constant registers

      ...
      dcl_cb cb2[16]    <=here is my array

      ...

      mov r1011, cb2[r1007.x]
      cmov_logical r1011.x___, r1008.y, r1011.y, r1011.x  <=unpacking row value from 128-bit string from here
      cmov_logical r1011.x___, r1008.z, r1011.z, r1011.x
      cmov_logical r1011.x___, r1008.w, r1011.w, r1011.x
      iand r1006.x___, r1010.x, l15
      iadd r1006, r1006.x, l16 ieq r1008, r1006, l12
      ishr r1011, r1011.x, l17
      cmov_logical r1011.x___, r1008.y, r1011.y, r1011.x
      cmov_logical r1011.x___, r1008.z, r1011.z, r1011.x
      cmov_logical r1011.x___, r1008.w, r1011.w, r1011.x
      ishl r1011.x___, r1011.x, l18
      ushr r1011.x___, r1011.x, l18 mov r65._y__, r1011.x

       

      Is there any way to prevent such behavior and store only one array element per constant register? The maximum speed is my goal.

        • Re: constant array packing
          realhet

          Hi,

           

          On pre GCN cards the register size is 128bits, so it can't be optimized there. To eliminate cmovs you can use a 128bit type for the array. It's waste of memory, though.

          On GCN reg size is 32bits at minimum and if I remember well, the CAL compiler will optimize out these cmovs. To make sure, check the .isa file for the s_cmov_b32 instructions.

            • Re: constant array packing
              digrobot

              Thanks for the advice. But I could not find any scalar 128-bit type in OpenCL specification. Compiler says error that the type "long long" is "nonstandard".
              I'm working with preCGN cards. In general, I need fast lookup table. –°urrently, byte replacement is the bottleneck of my program.