I have the following questions about local vs. global memory:
1. Are global memory accesses cached? Are they always cached, or does it depend on the hardware/OPenCL-Spec...?
I want to compare a very simple kernel, which only gives back the input element, with a kernel, which also gives back the input element, but before, writes it in the local memory. In both kernels, there are no calculations done. In the first kernel, simply the value is passed to kernel, in the second, a pointer for that one value is passed to the kernel. This is just for simple theoretical comparison reasons.
If all global memory accesses are cached, it is obvious, that the second version cannot be faster than the first one. Also, I assume, that the caches of the global memory are faster that an access to local memory. That leads to my second question:
2. If my assumption is wrong, what would be the number of local memory accesses to outperforme the global memory solution. Or is there no such number?
I appreciate your help!
1.Yes. GLobal accesses can be cached . read_only is a hint to the compiler that following buffers can be cached.
In some cases it might not a good Idea to LDS as caching can be sometimes efficient enough. On bandwidth basis LDS and L1 cache have very similar values, but actual performance may depend on the algorithm.
You can get more information in openCL programming guide.