Device level parallel problem

  Hi all:

      In the SDK samples, the "simpleMuitiDevice" uses multiple GPUs, however, according to my test, it still completes the calculation in the first GPU before the second begins to work. So it is still no better than a single GPU. My question is : How to make GPU[0] and GPU[1] work in paralel ?

 Thanks a lot.