I recently learned that enqueuing non-blocking commands requires explicit calling of clFlush.
My question is: Is it more efficient to load the queue with all the commands I wish to dispatch - and flush at the end, or is it better to flush after each command? The commands being
- write to input buffer (size 225280B)
- execute 2 kernels working on the same read-only buffer
- reading from the 2 result buffers (sizes 880B and 14080B)