Does anyone have experience of using multiple GPUs in single thread?
We got problem when using multiple GPUs concurrently.
This is what I do:
1. Prepare input data for GPU #0 with calResMap/calResUnmap
2. Prepare input data for GPU #1 with calResMap/calResUnmap
3. Use calCtxRunProgram to start the program on GPU #0
4. Use calCtxRunProgram to start the program on GPU #1
5. Use calCtxIsEventDone to wait program on GPU #0 to finish
6. Use calCtxIsEventDone to wait program on GPU #1 to finish
7. Read output data from GPU #0 with calResMap/calResUnmap
8. Read output data from GPU #1 with calResMap/calResUnmap
Both GPUs are same type and I equally divide the load to two GPUs. The prgoram need 2 seconds to finish.
In step 5, it takes calCtxIsEventDone 2 seconds to complete. That is fine.
In step 6, the second calCtxIsEventDone also takes 2 seconds. But I am expecting the second calCtxIsEventDone to takes almost no time if both GPUs run concurrently.
Any help?