cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

landmann
Journeyman III

__restrict without effect ?

Hi,

I have a kernel of the form

__kernel void  main(__global __write_only float4 * __restrict dest,
        __global __read_only float4 *__restrict src) { }

and within it I am copying some elements from src to dest.

I observed that a copy sequence of the form

dest[idx] = src[idx2]; dest[idx+offset] = src[idx2+offset2];

results in a slow-path write involving a wait_ack_outstanding after the first write.

Rewriting this by loading the two values first into locals, then writing them out results in the write pattern I expect to see.

So why does using "restrict" not lead to the same optimization ?

 

0 Likes
5 Replies
Meteorhead
Challenger

I do not know about __restrict specifier, what it's supposed to do, you might be right. Although it would be wise with simple groupings like this to follow a general rule that is also pointed out in the OpenCL programming guide (from AMD). Namely that one should group loads, computations and stores as much as possible. This is becuase packing of VLIW instructions break when switching from ALU, LDS instructions to FETCH or ALU/LDS.

One should FETCH as many data as possible (within reasonable boundaries) and try to do all ALU operations needed before writing results out to __global.

0 Likes
MicahVillmow
Staff

In SDK 2.3 we ignore the restrict keyword but plan on having it work in the next SDK. Btw, __read_only is not a valid keyword for pointers, use const instead.
0 Likes

Thanks for pointing to read and write only, left overs from prototyping until I realized my card does not support images and I had to switch to buffers.

Looking forward for 2.4 !

Joerg

0 Likes

Originally posted by: MicahVillmow In SDK 2.3 we ignore the restrict keyword but plan on having it work in the next SDK. Btw, __read_only is not a valid keyword for pointers, use const instead.


Is it mentioned anywhere?

0 Likes

Yes, it is mentioned in the OpenCL spec, 6.6 access qualifiers.

0 Likes