Some time ago nou published a very good post telling the steps to configure a multigpu enviroment in linux (http://forums.amd.com/forum/messageview.cfm?catid=390&threadid=139928&forumid=9).
Now I am running my application in linux and I am facing a strange situation: I have 2xHD5970 boards, so my application receives a parameter from 0 to 3 to select the device in which to run the kernel. If I open a terminal and run the application, everything goes fine for the 4 numbers (0 to 3); But If I open four terminals, and run 4 instances of the application at the same time (everyone with its corresponding parameter), the total running time goes 2x times slower.
So, I suspect that something is going serialized, but every instance of the application has its own context , command queues, buffers, etc. so there is no reason to have serialization issues.
So I kindly ask: Is there any function in the OpenCl api that gets serialized no matter the number of contexts? Is there a better approach to do multigpu? I have read the OpenCL programming guide, and for multigpu it states that one can do multiple contexts because the opencl runtime starts a new thread for each of them, and all functions (except kernel functions) are thread safe. I will really apreciate any insight about these matters. Thanks in advance for your help.