Since I've installed a dual-GPU graphics card today I'm wondering how to utilize both GPUs using two independent host threads. The problem is that the program crashes without any error output as soon as I start the second thread, which initializes a completely new platform. The only difference between the two threads is that thread A uses device 0 and thread B uses device 1.
I know that I can use the same platform and simply create a new context and add the second device to that context. But that would lead to some major changes in my code. Therefore, I'm looking for the simpliest way.
What works is: Compile program with one thread and choose device 0. Execute program. Compile program with one thread and device 1. Execute program.
Now both programs run in their own processes and that works fine. But as soon as I try to utilize both devices in one process it fails.
As I said both threads have their own cl_platform objects (and of course all other objects belonging to the platform). Actually they don't share anything at all. I don't understand why there are problems.
PS: Sometimes I do get a message from the system:
pure virtual method called
terminate called without an active exception
Aborted (core dumped)
That explains a lot ... There are many clSetKernelArg() calls in my program. Thanks.
edit: Quote from OpenCL documentation:
"An OpenCL API call is considered to be thread-safe if the internal state as managed by OpenCL remains consistent when called simultaneously by multiple host threads. OpenCL API calls that are thread-safe allow an application to call these functions in multiple host threads without having to implement mutual exclusion across these host threads i.e. they are also re-entrant-safe.
All OpenCL API calls are thread-safe except
clSetKernelArg, which is safe to call from any host thread, and is safe to call re-entrantly so long as concurrent calls operate on different cl_kernel objects. However, the behavior of the cl_kernel object is undefined if
clSetKernelArg is called from multiple host threads on the same cl_kernel object at the same time. "
Since each thread in my program has its own cl_kernel objects clSetKernelArg() shouldn't cause any problems.
problem with clSetKernelarg() is that between setting argument and queuing another thread can change argument so you end up with wrong arguments. if you manage lock cl_kernel during whole set-queue you can share one kernel between threads.
did you run multi device example from SDK? you should create simple test case which can be tested by AMD. otherwise they can't help you.
Since the threads use a separate context -- I dont think clSetKernelArg() is a problem for you.
Multi-threading was introduced in OpenCL 1.1. Please read the appendix section for more information on this.
Can you tell the following?
1. Which GPU cards you are using?
2. If AMD, which version of Catalyst driver are you using?
3. Which OS? Bitness?
If possible, Please post a small repro-case. We can validate it from our end.