AFAIK,__global buffers are always cacheable.
Can you please quote the source of the manuel you are refering to.
It's only a short note in the glossary of both the IL reference guide, July 2010, v.2.0d, and Stream Computing programming guide, rev2.01.
So, correspondingly, I'm using CAL +IL.
In the example of the post above, if I allocate the resource with the GLOBAL_BUFFER-flag, and run only Kernel_B, I get 0% hitrate. If I omit that flag, I get 50% hitrate (and of course a shorter runtime).
Currently there is no way to specify that a read from a UAV is cacheable. This will be fixed in an upcoming release by specifying the '_cached' flag on a UAV instruction.