5 Replies Latest reply on Jan 29, 2011 3:04 PM by landmann

    __restrict without effect ?

    landmann

      Hi,

      I have a kernel of the form

      __kernel void  main(__global __write_only float4 * __restrict dest,
              __global __read_only float4 *__restrict src) { }

      and within it I am copying some elements from src to dest.

      I observed that a copy sequence of the form

      dest[idx] = src[idx2]; dest[idx+offset] = src[idx2+offset2];

      results in a slow-path write involving a wait_ack_outstanding after the first write.

      Rewriting this by loading the two values first into locals, then writing them out results in the write pattern I expect to see.

      So why does using "restrict" not lead to the same optimization ?