Forcing uncached memory read.

Discussion created by on Dec 22, 2012
Latest reply on Dec 26, 2012 by binying



It seems that VLIW4/VLIW5 (don't know about GCN) architecture can do uncached UAV reads, but OpenCL doesn't expose that kind of functionality. There is one way to do it right now with this code:


template <typename T>
T uncached_load(__global volatile T *volatile pointer) {
     return *pointer;


But there's a drawback that multiple loads aren't grouped in one TEX clause.


With uncached loads it's possible to gain some performance, for example:


__kernel void square(__global float *buffer) {
     uint gid = get_global_id(0);
     float value = uncached_load<float>(&buffer[gid]);
     buffer[gid] = value * value;


works faster with uncached load than cached one. It's most likely that writing to a memory location that's already in L1 cache incurs some sort of performance penalty. Anyway with uncached loads programmers would have a bit more control over small L1 cache.


It would be nice to have some sort of intrinsic/extension to OpenCL to have uncached loads. It would benefit CPU kernels as well, as most CPUs have streaming load/store instructions that doesn't pollute caches.