for a memory location that is accessed by a single thread, is it allowed to do:
old = load_buffer(global_addr);
store_buffer(new, global_addr); // will be executed before the memory load is finished
and expect old to contain the original value and the memory to contain the new value?
Please look at memory consistency section in OpenCL Spec
Let me paste it from OpenCL 1.2 spec for your reference:
Within a work-item memory has load / store consistency. Local memory is consistent across
work-items in a single work-group at a work-group barrier. Global memory is consistent across
work-items in a single work-group at a work-group barrier, but there are no guarantees of
memory consistency between different work-groups executing a kernel.
I hope this answers your question