Does anyone have experience of using multiple GPUs in single thread?
We got problem when using multiple GPUs concurrently.
This is what I do:
1. Prepare input data for GPU #0 with calResMap/calResUnmap
2. Prepare input data for GPU #1 with calResMap/calResUnmap
3. Use calCtxRunProgram to start the program on GPU #0
4. Use calCtxRunProgram to start the program on GPU #1
5. Use calCtxIsEventDone to wait program on GPU #0 to finish
6. Use calCtxIsEventDone to wait program on GPU #1 to finish
7. Read output data from GPU #0 with calResMap/calResUnmap
8. Read output data from GPU #1 with calResMap/calResUnmap
Both GPUs are same type and I equally divide the load to two GPUs. The prgoram need 2 seconds to finish.
In step 5, it takes calCtxIsEventDone 2 seconds to complete. That is fine.
In step 6, the second calCtxIsEventDone also takes 2 seconds. But I am expecting the second calCtxIsEventDone to takes almost no time if both GPUs run concurrently.
I solved the problem myself.
In "ATI Stream Computing User Guide":
For improved performance, calCtxRunProgram does not immediately
dispatch the program for execution on the stream processor. To force the
dispatch, the application must call calCtxIsEventDone and calCtxFlush on
the corresponding event.
So I have to call calCtxIsEventDone and calCtxFlush before the program will actually start.