Showing results for 
Search instead for 
Did you mean: 


Adept I

Re: Cuckoo hashing in OpenCL

Jump to solution

himanshu.gautam wrote:


Since you are initializing the local variable, barrier is not required here.

Also in the loop you are doing something with local memory right. i dont know exactly what you are doing but i can guess like if you are reusing /updating the values of local variable then all the threads whcih are < size try to do at same time. This is one thing which you need to take care.

One more thing which you can try is instead of LOCAL MEM FENCE try GLOBAL MEM FENCE and check the results.


     I'll try to make it clearer. I use a local variable alert that is (re)set to 0 when the outer loop starts.

     If a thread fails, it sets alert to 1. I'm reposting the previous code with extra stuff.

     for(attempts=0; attempts<100; attempts++) // all treads in a work group execute the same number of iterations


          // initialize local variables

          if(lid==0) alert = 0;

          barrier(CLK_LOCAL_MEM_FENCE); // <- is this needed? **

          // copy from global memory



           for(i=0; i<3; i++) // all treads in a work group execute all of the 3 iterations




                    // do stuff in local memory




          // do more stuff in local memory

          // if a thread failed to do its work, it sets alert to 1, so that the rest of the threads find out

          if(...) alert = 1;


          // if nobody has set alert to 1, it means everyone succeeded, so the loop breaks and the kernel finishes

          if(alert == 0) break;


     The only explanation I can give from the output is that they fail to synchronize on variable alert, so some threads continue the outer loop and others don't.


     Why should I try with GLOBAL MEM FENCE? Kernel uses local memory. Only at the end each work group writes its final results in a specific place in global memory (no conflicts between work groups).


     Thank you for your time,


Adept I

Re: Re: Cuckoo hashing in OpenCL

Jump to solution

Hello again,

Turns out the problem is on your side.

After looking at this thread , I tried without #pragma unroll and everything seems to work now.

The funny thing is nvidia driver has the same problem...

I'm attaching cleaner code (with comments on the #pragma unroll) in case you want to check what's going on.

Thanks for the support and a big thanks to mulisak,


View solution in original post