cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

rj_marques
Journeyman III

Command Queue performance

Greatings

Please consider the following scenario:

A thread pool comprised of N host-threads, each responsible for an independant OpenCL execution, i.e., managing memory objects, a kernel, etc.

Regarding command queues, in this scenario, what is more efficient?

- Having a shared queue, on which every thread issues commands for its execution.

- Having a out of order shared command queue, on which every thread issues commands for its execution, and internally synchronizes via events.

- Having N command queues, one for each thread.

In my experience, having multiple command queues may hamper performance if the number of queues scales. On the other hand, for what i've read, AMD's OpenCL does not give support to concurrent kernels.

Thanks in advance for your opinions. =D

0 Likes
1 Solution

You can pipeline kernel execution, kernel transfer and kernel creation. We have apps that do this and are very efficient.

So basically what you would do is something like this:

setup buffer N & N + 1

enqueue buffer N

enqueue kernel N

setup buffer N + 2

enqueue Buffer N + 1

enqueue kernel N + 1

readback buffer N

setup buffer N + 3

enqueue Buffer N + 2

enqueue kernel N + 2

readback buffer N + 1

etc...

View solution in original post

0 Likes
13 Replies