I have the following questions about local vs. global memory:
1. Are global memory accesses cached? Are they always cached, or does it depend on the hardware/OPenCL-Spec...?
I want to compare a very simple kernel, which only gives back the input element, with a kernel, which also gives back the input element, but before, writes it in the local memory. In both kernels, there are no calculations done. In the first kernel, simply the value is passed to kernel, in the second, a pointer for that one value is passed to the kernel. This is just for simple theoretical comparison reasons.
If all global memory accesses are cached, it is obvious, that the second version cannot be faster than the first one. Also, I assume, that the caches of the global memory are faster that an access to local memory. That leads to my second question:
2. If my assumption is wrong, what would be the number of local memory accesses to outperforme the global memory solution. Or is there no such number?
I appreciate your help!