Thank you, gaurav.
Through my test, I found that, in a multi-thread program, two kernels could execute in parallel on different GPUs.
But, I still don not know how to parallelize the two kernel execution in a thread. The single thread program is much the same as the example in section 2.16.3 of stream computing user guide.
In my programm, two kernel deal with two different streams on different GPUs.
My OS is SLES 10, and the SDK is v1.4-beta.