I've written a kernel that I would like to run on both the GPU and the CPU concurrently. First, I launch the GPU kernel, and then immediately launch the CPU kernel. The timers I have indicate that the GPU kernel doesn't block and the CPU kernel is indeed queued up immediately.
Unfortunately, it seems that the kernels don't run concurrently, or else I'm timing something incorrectly. What I'd like to see is the following (for instance):
Time for a single kernel to run on the GPU = 5 sec
Time for a single kernel to run on the CPU = 8 sec
Time for both, running concurrently = around 8 sec (but I'm seeing 13 sec)
I guess my question is about whether or not clEnqueueNDRangeKernel() will forward a kernel on to a second processor in a two-processor system if the first processor is already running a kernel. Thanks!
Edit: A related question would be: is it possible to have two queues, or is there only one queue because there is only one global clEnqueueNDRangeKernel() ?