I think it would be good to add a mechanism to control better the memory caches.
For instance, you could add a flag to the clCreateBuffer like CL_CACHED.
clCreateBuffer ( .....,CL_MEM_READ|CL_CACHED ... )
Other option is to add a modifier keyword to the __global one:
__kernel void MyKernel ( __global __cached int *w )
const int data = w[get_global_id(0)];
Sometimes I want the global memory cached but other times not.