I'm little confused of property of "in-order" command queue.
From 4.7.4 of APP SDK programming guide (page 4-36), there's the statement of "
Command-queues that are configured to execute in-order are guaranteed to
complete execution of each command before the next command begins. This
synchronization guarantee can often be leveraged to avoid explicit
clWaitForEvents() calls between command submissions. Using
clWaitForEvents() requires intervention by the host CPU and additional
synchronization cost between the host and the GPU; by leveraging the in-order
queue property, back-to-back kernel executions can be efficiently handled
directly on the GPU hardware.
However, the same page also said "AMD Southern Islands GPUs can execute multiple kernels simultaneously when there are no dependencies."
My questions are:
- If command queue is in-ordered, then why multiple kernels can run simultaneously? Shouldn't they be exectued sequentially?
- Seems DMA commands can be async with other commands in the same queue, developers should use blocking API or events to check the status, then how to define in-order here?
Hoping can get clear clarification of the in-order queue behavior.