You might want to lookup the IL document in CAL.
There is just one instruction that does for that purpose.
Fence for synchronization threads, and/or lds
How is this done from Brook+?
Brook+ doesn't expose local data share, neither it allows reading from output scatter buffer, hence it need not to put synchronization barrier.