Is it possible to write to global memory bypassing the cache like we can do on CPU?
It seems that in most situations we don't write temporal values to global memory. Actually I never met any case that caching writes is useful.
Sorry for this late reply.
No, you can't do in OpenCL right now. However, we are considering this feature for a future release. The Khronos OpenCL working group is also aware of a request for this.
Any updates on opencl non-temporal stores ? By the way, recent Ryzen perf improvements involved
removing unnecessary NT stores.
Sorry, there is no update at this point. I'll share if I get to know any.
You can do that with AMD GPUs if you have the time to patch GPU ISA.
See GCN instruction set arch , load/store instructions have a bit called 'SLC', - System Level Coherent, if it is flipped the GPU basically bypasses all its caches.
Retrieving data ...