Take a look at section 2.16.3 of stream computing user guide to see how to use multiple GPUs in single thread. I would suggest to create seperate threads for multiple GPUs as leveraging kernel asynchronous call requires lots of tuning and the call might not be asyncronous in some cases. Take a look at Brook+ sample MonteCarlo_MultiGPU and tutorial MultiGPU.
Thank you, gaurav.
Through my test, I found that, in a multi-thread program, two kernels could execute in parallel on different GPUs.
But, I still don not know how to parallelize the two kernel execution in a thread. The single thread program is much the same as the example in section 2.16.3 of stream computing user guide.
In my programm, two kernel deal with two different streams on different GPUs.
My OS is SLES 10, and the SDK is v1.4-beta.