Bdot

atomic_inc not reliable?

Discussion created by Bdot on Jul 27, 2011
Latest reply on Jul 29, 2011 by Bdot
1024 times atomic_inc results in ~800 increases

Hi,

How do I use atomic_inc correctly? I have a test program that either skips some of the atomic_inc's, or the function is not as atomic as it should be. Can someone point out the error?

When the attached kernel is started with less than 10 threads, the resulting buffer contains

1 1 1 1 0 0 0 ... 0

With even more threads, it begins to increase, for 10-12 threads it looks like

5 2 2 2 3 3 3 4 4 4 5 5 5 5 5 5 0 0 0 ... 0

With increasing number of threads it appears to execute an average of about 80% of the atomic_inc's. There is no difference when using atom_inc instead.

My expectation would be that the first number always contains the total number of threads (due to the atomic_inc being run once by each thread). The following number can be out-of-order, that's no problem. But in the example above, thread 5 seems to have written its results twice ...

I'm running a HD5770 with Catalyst 11.6 on SuSE11.4/64-bit.

Thanks

#pragma OPENCL EXTENSION cl_khr_global_int32_base_atomics : enable #define ATOMIC_INC(x) atomic_inc(&x) //#define ATOMIC_INC(x) atom_inc(&x) //#define ATOMIC_INC(x) ((x)++) __kernel void test_k( __global uint *res) { __local uint i,f; f = get_global_id(0); if (f==0) { for(i=0;i<32;i++) res[i]=0; } f++; // let the reported results start with 1 i=ATOMIC_INC(res[0]); if(i<10) /* limit to 10 results */ { res[i*3 + 1]=f; res[i*3 + 2]=f; res[i*3 + 3]=f; } }

Outcomes