I recently learned that enqueuing non-blocking commands requires explicit calling of clFlush.

My question is: Is it more efficient to load the queue with all the commands I wish to dispatch - and flush at the end, or is it better to flush after each command? The commands being

  • write to input buffer (size 225280B)
  • execute 2 kernels working on the same read-only buffer
  • reading from the 2 result buffers (sizes 880B and 14080B)