MultiGPU implementation in Linux?

Discussion created by afo on Oct 5, 2010
Latest reply on Oct 21, 2010 by datlatec
Which is the best strategy to implement multigpu in linux?

Hello all,

Some time ago nou published a very good post telling the steps to configure a multigpu enviroment in linux (

Now I am running my application in linux and I am facing a strange situation: I have 2xHD5970 boards, so my application receives a parameter from 0 to 3 to select the device in which to run the kernel. If I open a terminal and run the application, everything goes fine for the 4 numbers (0 to 3); But If I open four terminals, and run 4 instances of the application at the same time (everyone with its corresponding parameter), the total running time goes 2x times slower.

So, I suspect that something is going serialized, but every instance of the application has its own context , command queues, buffers, etc. so there is no reason to have serialization issues.

So I kindly ask: Is there any function in the OpenCl api that gets serialized no matter the number of contexts? Is there a better approach to do multigpu? I have read the OpenCL programming guide, and for multigpu it states that one can do multiple contexts because the opencl runtime starts a new thread for each of them, and all functions (except kernel functions) are thread safe. I will really apreciate any insight about these matters. Thanks in advance for your help.

best regards,