I have one parent kernel and three child kernels, where the child kernels are independent of each other,
but they can only start executing once the parent kernel completes.
The parent runs on command queue 0, and the three child kernels run on command queues 1,2 and 3 respectively.
I use events to synchronize parent and children.
There seems to be a race condition or bug where data sometimes gets corrupted at the end of the pipeline.
It happens roughly once every 10 runs of my application.
If I put all kernels on command queue 0, then problem goes away.
As this design is at the core of my application, it is hard to create a reproducer.
So, just want to make this vague bug report - perhaps this kind of scenario can be added to AMD unit tests.