This is just to share some insights about using SDK 2.4 for multiGPU.
msi 790FX-GD70, AMD Phenom II 945, 4 GB RAM, OpenSuse 11.2 x86_64, SDK 2.4, catalyst 11.3, 2xHD5970
There is an improvement over SDK 2.3, now 3/4 instances of the same application using different GPUs don't have a drop in performance respect to 1/2 instances.
When I tried 4 instances of the application running in parallel the machine hanged; but the problem was that a 1000W PSU was not enought for the system; using a 700W PSU for the second card solved the problem, so OpenCL multiGPU seems to work, at least for one thread, one context, one GPU, and each context having its own buffers, kernels, etc.
This has worked previously, as far as I was concerned. My config has 3xHD5970, and the issue to the abysmal scaling of the application was the setting of the environmental variable GPU_USE_SYNC_OBJECTS=1 . In my certain application even the results were correct. I used MPI for multi-threading.
I wasn't so lucky...
With SDK 2.3 and even using GPU_SYNX_OBJECTS=1; I had a drop of performance of around 1.5x in the 3rd and 4th GPUs, so I thought that could be helpful to show that this was solved for me...
There are 4 instances of the same application that receive the number of GPU to use as a parameter.
Each instance generates one context for its GPU and one set of buffers, kernels, events, etc; there is no sharing between the instances.