__restrict without effect ?

Discussion created by landmann on Jan 28, 2011
Latest reply on Jan 29, 2011 by landmann


I have a kernel of the form

__kernel void  main(__global __write_only float4 * __restrict dest,
        __global __read_only float4 *__restrict src) { }

and within it I am copying some elements from src to dest.

I observed that a copy sequence of the form

dest[idx] = src[idx2]; dest[idx+offset] = src[idx2+offset2];

results in a slow-path write involving a wait_ack_outstanding after the first write.

Rewriting this by loading the two values first into locals, then writing them out results in the write pattern I expect to see.

So why does using "restrict" not lead to the same optimization ?