1 Reply Latest reply on Aug 7, 2011 2:34 PM by maximmoroz

    Is a mem_fence supposed to be needed here?

    arsenm

      I'm trying to figure out if my understanding of mem_fence() is wrong based on a problem I'm having.

      My understanding of mem_fence() is that it's supposed to be used to control the order that things become visible to other items in a workgroup in memory, and that memory was supposed to be consistent within a single work item. I've run into a situation where I'm not sure if I'm interpreting what consistency within a single work item means. My problem is within a single work item, a later read from a previously written address gets the overwritten value. Other workitems aren't touching the address. If I stick a global mem_fence() between the write and later read, I get the correct written value.

      I'm using SDK 2.5, Catalyst 11.7, Linux x86_64 on a Radeon 6970

      Here is pseudocode for what I'm seeing:

      __global volatile int* buffer; i = calculate some index into buffer buffer[i] = x // suppose buffer[i] originally contains 'a' // mem_fence(GLOBAL) this works correctly if I place this here j = some other index calculation // j happens to be equal to i y = buffer[j] // y here is equal to a, and not the recently written x

        • Is a mem_fence supposed to be needed here?
          maximmoroz

          mem_fence places limits on reordering reads and writes by the compiler. It is perfectly valid to use mem_fence even if you don't need to synchronize between work-items of the same work-group.

          In your particular case if you don't specify memory fence the compiler might generate code which read buffer[j] before writing to buffer. As this reordering might speed up the kernel by bunching together in single clase global memory read instructions.