Is this an effective way to achieve global synchroniation on the GPU?

Discussion created by redditisgreat on Feb 15, 2011
Latest reply on Feb 15, 2011 by redditisgreat
I implemented a barrier with atomic operations

My initial testing seems to indicate that it works.

Is there a way to do the same without forcing the compiler "complete path" memory mode?

global uint sema = 0; if( get_local_id(0)==0 ) atomic_inc( sema ); while( sema % num_groups ) if( get_local_id(0)==0 ) atomic_add( sema, 0 );