cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

sblackwell
Journeyman III

makeshift global barrier

trouble reading from global memory after atom_inc

I know there's no global barrier in OpenCL, but I'm trying to create a workaround using the following code:

 

void barrier(__global uint* scratch) {

  uint nThreads = get_global_size(0);

  atom_inc(scratch);

  /* this loop never terminates */

  while(scratch[0] < nThreads) {

    continue;

  }

}



 

The idea is that each thread loops until all threads have incremented that memory.

I know the memory is being incremented because scratch[0] = nThreads when I read scratch back to host memory, but loops never terminate. When I have the threads write out the value at scratch[0] elsewhere, they all just print the result of their atom_inc.

I know I can normally read just fine from global memory, but what am I missing here?

0 Likes
3 Replies
himanshu_gautam
Grandmaster

hi sblackwell,

I must say a good try.

But how workitems are executed inside a Compute Unit is entirely implementation dependent.Even if you run just 256 threads i.e one workgroup you cannot say whether all the 4 wavefronts will execute in a round robin approach or only one wavefront will remain stuck in the while loop & other will keep waiting for this wavefront.

Global Sync can only be implemented by using different kernels as for now.

I hope it is clear.

0 Likes

sblackwell,
This will only work if you run exactly enough work groups to fill up the chip once or fewer. Say the device has N SIMD's and each work group takes up all the resources on the SIMD. A launch size of N work-groups should execute this code correctly. If N+1 work-groups are executed, the first N work-groups will loop waiting for scratch[0] to hit nThreads. The last work-group will not get scheduled because the previous work-groups have not finished and there are no resources left.

Hope this makes sense on why the solution you have is not fully generic.
0 Likes
sblackwell
Journeyman III

Yeah that makes perfect sense. I had never thought of it that way. I guess I had assumed (wrongly) that all work groups were executed concurrently.

Thanks!
0 Likes