I am working on partially ordering work group execution order using atomics and need a way to instruct one work group to wait on another workgroup. To achieve this, the waiting work group busy waits on an atomic value which the other work group will set. However, if the two work groups are assigned to the same SIMD queue then the waiting work group may block the work group it is waiting on thereby causing a deadlock. Thus, in the busy loop, the work group needs a way to say "swap me out and let other work groups take my place", i.e. yield to other workgroups.
Is there a way to do this on AMD hardware? I know this is not OpenCL spec stuff so I'm just playing around at the moment.