Hello everyone!
As well known OpenCL barrier() function works only for single workgroup, and there is no direct possibility to synchronize workgroups. If it possible what's best approach for global synchronization today? Using atomics, OpenCL 2.0 features, etc.?
Solved! Go to Solution.
Hi,
Yes, you should use atomics. As I know there is a special atomic instruction that use the faster GDS memory (atom_inc() vs. atomic_inc()), or maybe I know it wrong... Test both of then, anyways.
On GCN chips there is hardware global synchronization present. I've managed to synch 8 wavefronts/CU at 400KHz rate, so it's really fast, wasting only a few hundred cycles.
It's called Global Wave Synch (GWS). But unfortunately google can't find anything of it related to OpenCL.
Hi,
Yes, you should use atomics. As I know there is a special atomic instruction that use the faster GDS memory (atom_inc() vs. atomic_inc()), or maybe I know it wrong... Test both of then, anyways.
On GCN chips there is hardware global synchronization present. I've managed to synch 8 wavefronts/CU at 400KHz rate, so it's really fast, wasting only a few hundred cycles.
It's called Global Wave Synch (GWS). But unfortunately google can't find anything of it related to OpenCL.