I have a PC with 2 R4870 video cards running in Linux. Using pyOpenCL, I can run a program in either GPU, but when I try to run 2 simultaneous kernels (one in each card), it seems that in order to the second queued kernel to run, the first one queued must be finished. I'm expecting that I can queue 2 instances of the same kernel, one in each GPU, and that the total running time should be roughly the same as if I run only one instance, but this is not happening. I have tried using a clFlush after queuing each kernel, but the running time still the same.
Is it possible to use both (multiple) GPUS simultaneously in OpenCL? How can this be done?
Pyrit also uses seperate contexts and queues for all GPUs. Its a bug in AMD's implementation.
Also see http://forums.amd.com/devforum/messageview.cfm?catid=390&threadid=128846&enterthread=y
I have tried using two context and two queue (one per GPU), also using one context and two queues (also one per GPU), but the result is all ways the same, the kernels are not executed simultaneously. Is there any example of how this could be done? I'm using pyOpenCL, can this be a pyOpenCL bug? Or maybe I have a configuration problem in my PC? Is there any example on how multiGPU servers are being used?
well there is maybe problem with that OpenCL is quite lazy. if you queue some work it did not begin execute. so IMHO it works like this
enqueu first GPU
enqueu second GPU
clFinish(queue1) //begin execution on first GPU. second is lazy and did not execute.
clFinish(queue2)//now it begin execute on GPU two
so try after enqueu call clFlush() (after that it should begin execute but not block calling thread) or better call it from different threads.