13 Replies Latest reply on Mar 28, 2011 12:07 PM by lupescu_grigore

    OpenCL mem_fence

    lupescu_grigore
      Given the following kernel :
      __kernel void testKernel(__global float* input,__global float* output,int nrElements)
      {
            uint x = get_global_id(0);
          if(x<(nrElements-1))
            output[x]=input[x+1];
          else
            output[x]=input[0];
          
          mem_fence(CLK_LOCAL_MEM_FENCE);
          output[x]++;
      }
      And the input 0 1 0 1 0 1 0 1 0 1
      One would expect 2 1 2 1 2 1 2 1 as output ( shift + add 1)
      Results are :
      No mem_fence
      CPU Core i3 OK
      GPU 5470 OK
      With mem_fence
      CPU Core i3 OK
      GPU 5470 ERROR - values got { 1 2 3 4 5 } ... 2 1 2 1 2 ... 3 2 3 2 .. 5 4 5 4...
      Inputs need not be very long. 100-1000 will do.
      Am i missing something ?
        • OpenCL mem_fence
          Meteorhead

          You are tyring to mem_fence a variable with wrong address space qualifiers. output is a __global vairable and you are using CLK_LOCAL_MEM_FENCE.

          You should be using CLK_GLOBAL_MEM_FENCE.

          Please read the related section of the OpenCL specification, it states clearly what the purpose of this function is. (All memory operations before and after the fence do not mix in the specified address space) Read also for restrictions on using sync commands of this type.