I'm trying to figure out if my understanding of mem_fence() is wrong based on a problem I'm having.
My understanding of mem_fence() is that it's supposed to be used to control the order that things become visible to other items in a workgroup in memory, and that memory was supposed to be consistent within a single work item. I've run into a situation where I'm not sure if I'm interpreting what consistency within a single work item means. My problem is within a single work item, a later read from a previously written address gets the overwritten value. Other workitems aren't touching the address. If I stick a global mem_fence() between the write and later read, I get the correct written value.
I'm using SDK 2.5, Catalyst 11.7, Linux x86_64 on a Radeon 6970
Here is pseudocode for what I'm seeing:
__global volatile int* buffer; i = calculate some index into buffer buffer[i] = x // suppose buffer[i] originally contains 'a' // mem_fence(GLOBAL) this works correctly if I place this here j = some other index calculation // j happens to be equal to i y = buffer[j] // y here is equal to a, and not the recently written x
mem_fence places limits on reordering reads and writes by the compiler. It is perfectly valid to use mem_fence even if you don't need to synchronize between work-items of the same work-group.
In your particular case if you don't specify memory fence the compiler might generate code which read buffer[j] before writing to buffer. As this reordering might speed up the kernel by bunching together in single clase global memory read instructions.