After switching from OpenCL to CAL/IL to be able to use the 4 GPUs I found an issue related to which GPU is running the kernel:
In my application (single context ,single thread) the GPU to use is a parameter; when I use GPU 0 or 1, my application runs for 55/60 seconds; the same application using the same dataset using GPU 3 or 4 runs for 75/90 seconds. In all cases there is only one GPU running and is the same application running in the same terminal (I am using LinuxX86_64-OpenSuse11.2).
Is there an explanation for this? Has someone seen something similar? Any insight about this will be very appreciated