btw... What do you mean by "will execute synchronously" with the code.
CUDA Streams will keep executing on device... And when you say "cudaThreadSynchronize()" - you synch with the device. I dont think Stream0 is any special...
May be, for other non-zero streams you require cudaStreamSychronize()? and for 0 you can use "cudaThreadSynchronize"... Is that what you mean?
You can use "clFinish(comm_Q)" which will tbe effect of calling "cudaStreamSynchronize(stream#)"
Thats straightforward enough..
Please explain your problem in detaill.. .We will give you the solution..
Best REgards,
Bruhaspati