Please consider the following scenario:
A thread pool comprised of N host-threads, each responsible for an independant OpenCL execution, i.e., managing memory objects, a kernel, etc.
Regarding command queues, in this scenario, what is more efficient?
- Having a shared queue, on which every thread issues commands for its execution.
- Having a out of order shared command queue, on which every thread issues commands for its execution, and internally synchronizes via events.
- Having N command queues, one for each thread.
In my experience, having multiple command queues may hamper performance if the number of queues scales. On the other hand, for what i've read, AMD's OpenCL does not give support to concurrent kernels.
Thanks in advance for your opinions. =D