7 Replies Latest reply on Feb 2, 2011 8:01 PM by MicahVillmow

    Barriers when reading to local memory



      why causes copying from global into local memory on an RV730 a barrier after each write to the local store? The compiler seems to be smart enough to remove the following explicit barrier at all, but I would rather prefer not to see additional ones but just the one I put.


      __kernel void main(__global float2 *dest,
             __global const float *data)
        uint X = get_global_id(0);
        uint Y = get_global_id(1);

        __local float buffer[4096];

        const uint W=1024;
        const uint H=480;
        uint row = Y * W;

        uint myID = get_local_id(0);
        buffer[myID] = data[row+myID];
      /*buffer[W+myID] = data[row+W+myID];
        barrier(CLK_LOCAL_MEM_FENCE); */
        dest[row + X] = (float2)(buffer[myID+1],buffer[myID+4]);

      For the 58xx and 57xx I observed the expected output. No barrier placed if not explicitely requested.

      Any technical reason for this behaviour?


      PS: I realized I can extend the barrier-less writes using vload/vstore, but after 4 or 8 elements a barrier is always inserted At least this reduces the barrier count to 7/8 compared to individual floats.