I'm adding producer - consumer design into an OpenCL program for multiple GPUs and I managed to make it work at the host-device synchronization level. Now I need to add finer grained queue handling for commands and without adding any synchronization between host and device, only way I can think of is querying a device's command queue's remaining commands and finding the device with minimum commands remaining and assign a command to that device.
How can I know that how many commands are waiting to be processed in a command queue?
This is for OpenCL 1.2.
For now, I'm learning how to use callbacks with markers to count things. Does a firing callback halt its command queue? I wish not because I need finer grained control with less bubbles.
Trying something like this:
but I'm not sure if its the way to do it.