I implemented a barrier with atomic operations
My initial testing seems to indicate that it works.
Is there a way to do the same without forcing the compiler "complete path" memory mode?
global uint sema = 0; if( get_local_id(0)==0 ) atomic_inc( sema ); while( sema % num_groups ) if( get_local_id(0)==0 ) atomic_add( sema, 0 );