I'm trying to figure out if my understanding of mem_fence() is wrong based on a problem I'm having.
My understanding of mem_fence() is that it's supposed to be used to control the order that things become visible to other items in a workgroup in memory, and that memory was supposed to be consistent within a single work item. I've run into a situation where I'm not sure if I'm interpreting what consistency within a single work item means. My problem is within a single work item, a later read from a previously written address gets the overwritten value. Other workitems aren't touching the address. If I stick a global mem_fence() between the write and later read, I get the correct written value.
I'm using SDK 2.5, Catalyst 11.7, Linux x86_64 on a Radeon 6970
Here is pseudocode for what I'm seeing:
__global volatile int* buffer; i = calculate some index into buffer buffer = x // suppose buffer originally contains 'a' // mem_fence(GLOBAL) this works correctly if I place this here j = some other index calculation // j happens to be equal to i y = buffer // y here is equal to a, and not the recently written x