3 Replies Latest reply on Nov 3, 2009 3:14 PM by MicahVillmow

    Bad performance on moving data between private memory and local memory


      Moving data from private memory to local memory is a very time-consuming job, isn't it?  When using the local memory in the kernel, my program runs much slower than before.



      __private float4 block[4];

      __local float4 local_block[16];


      //very slow here. Why?

      local_block[local_id] = block[0];

      local_block[local_id + 1] = block[1];

      local_block[local_id + 2] = block[2];

      local_block[local_id + 3] = block[3];