I have a question in regards of concurrent kernel execution. I ran through some old forum posts over here and at nVidia and I am getting mixed answers about this topic.
I know it is possible to run concurrent CUDA kernels on nVidia hardware but I am getting mixed answers in regards of OpenCL kernels running concurrently.
My intention is to run (for example) 10 OpenCL kernels of the same code but each are pointing to different global memory variables. I was thinking of creating a single out of order execution queue for a GPU device and send 10 kernels to it -- while praying they launch concurrently. I am wondering if this is possible with the ATi hardware? If so, which HD series?
Thanks in advanced
The winner for me was multiple CLContexts for the same CLDevices instead of a single context created with the out_of_order flag. Tested on 7970, 6970 and 5970, and only with two concurrent kernels per device with 10-20% kernel overlap. My aim was to minimize the gaps between kernels in order to achieve maximum alu utilization.