3 Replies Latest reply on Sep 30, 2010 4:15 PM by sblackwell

    makeshift global barrier

      trouble reading from global memory after atom_inc


      I know there's no global barrier in OpenCL, but I'm trying to create a workaround using the following code:


      void barrier(__global uint* scratch) {

        uint nThreads = get_global_size(0);


        /* this loop never terminates */

        while(scratch[0] < nThreads) {





      The idea is that each thread loops until all threads have incremented that memory.

      I know the memory is being incremented because scratch[0] = nThreads when I read scratch back to host memory, but loops never terminate. When I have the threads write out the value at scratch[0] elsewhere, they all just print the result of their atom_inc.

      I know I can normally read just fine from global memory, but what am I missing here?

        • makeshift global barrier

          hi sblackwell,

          I must say a good try.

          But how workitems are executed inside a Compute Unit is entirely implementation dependent.Even if you run just 256 threads i.e one workgroup you cannot say whether all the 4 wavefronts will execute in a round robin approach or only one wavefront will remain stuck in the while loop & other will keep waiting for this wavefront.

          Global Sync can only be implemented by using different kernels as for now.

          I hope it is clear.

          • makeshift global barrier
            This will only work if you run exactly enough work groups to fill up the chip once or fewer. Say the device has N SIMD's and each work group takes up all the resources on the SIMD. A launch size of N work-groups should execute this code correctly. If N+1 work-groups are executed, the first N work-groups will loop waiting for scratch[0] to hit nThreads. The last work-group will not get scheduled because the previous work-groups have not finished and there are no resources left.

            Hope this makes sense on why the solution you have is not fully generic.
            • makeshift global barrier
              Yeah that makes perfect sense. I had never thought of it that way. I guess I had assumed (wrongly) that all work groups were executed concurrently.