Is there any way to synchronize (barrier) threads in a kernel?
You might want to lookup the IL document in CAL.
There is just one instruction that does for that purpose.
Fence for synchronization threads, and/or ldsInstructions FENCE
How is this done from Brook+?
Brook+ doesn't expose local data share, neither it allows reading from output scatter buffer, hence it need not to put synchronization barrier.
Retrieving data ...