Through the sample code in SDK and posted messages I see that input parameters declared usually as:
__global TYPE * or __global const TYPE *, where TYPE could be any of valid types, int, for example.
However, there is a recommendation (or at least way) to declare input buffer with __constant address space qualifier to take advantage of const buffers/caches in Radeon.
__constant qualifier allows to refer to global memory as well.
Just having very simple kernel SKA shows that performance when input array declared as
__constant int * worse comparing to __global const int *. Whether const caches are that small and ineffective, and only good for a non-mem object parameters?
Whether any difference exists between declaration of input as
__global int * or __global const int * performance wise, or it is just cl language protection from writing into the input array?
On 4xxx, it apparently does not matter. I have done some testing on 4670 and both show almost the same performance. Since I was working on linux and there was no SKA available, I concluded that it could be because constant memory on those boards is emulated in global memory. But I may be wrong on that.
use __constant is performace wise. but AMD GPU have limited constant space in HW. only 16kB IIRC. but OpenCL spec require 64kB. so AMD must it emulate in global memory. but you can utilize real constant if you specife it maximum size. you can find more in OpenCL programing guide from AMD.
Thank you, Micah!
Just to dive in it a little bit longer
Does it mean that __constant int * and const global int * can be used interchangably in future? (but today __constant int * uses cache unconditionally, while const global int* is not using const cache today, but will be optimized someday to use constant cache properly?)