I have a PC with 2 R4870 video cards running in Linux. Using pyOpenCL, I can run a program in either GPU, but when I try to run 2 simultaneous kernels (one in each card), it seems that in order to the second queued kernel to run, the first one queued must be finished. I'm expecting that I can queue 2 instances of the same kernel, one in each GPU, and that the total running time should be roughly the same as if I run only one instance, but this is not happening. I have tried using a clFlush after queuing each kernel, but the running time still the same.
Is it possible to use both (multiple) GPUS simultaneously in OpenCL? How can this be done?