Barriers when reading to local memory

Discussion created by landmann on Feb 1, 2011
Latest reply on Feb 2, 2011 by MicahVillmow


why causes copying from global into local memory on an RV730 a barrier after each write to the local store? The compiler seems to be smart enough to remove the following explicit barrier at all, but I would rather prefer not to see additional ones but just the one I put.


__kernel void main(__global float2 *dest,
       __global const float *data)
  uint X = get_global_id(0);
  uint Y = get_global_id(1);

  __local float buffer[4096];

  const uint W=1024;
  const uint H=480;
  uint row = Y * W;

  uint myID = get_local_id(0);
  buffer[myID] = data[row+myID];
/*buffer[W+myID] = data[row+W+myID];
  barrier(CLK_LOCAL_MEM_FENCE); */
  dest[row + X] = (float2)(buffer[myID+1],buffer[myID+4]);

For the 58xx and 57xx I observed the expected output. No barrier placed if not explicitely requested.

Any technical reason for this behaviour?


PS: I realized I can extend the barrier-less writes using vload/vstore, but after 4 or 8 elements a barrier is always inserted At least this reduces the barrier count to 7/8 compared to individual floats.