edwen

multiple GPUs problem

Discussion created by edwen on Aug 31, 2010
Latest reply on Sep 1, 2010 by edwen
The GPUs stop working in parallel while the kernel gets more complicated

There are two kernels in my program, with the second one more complicated than the first one. I am trying to use 3 GPUs in parallel execution. From the results, the first kernal runs normally, with the total time slightly longer than that of each individual GPU. However, the total execution time of the second kernel is longer than the sum of all 3 GPUs. Can anyone help explain what may cause the problem? Here are the results:

First Kernel...

v  =    224.06217957
Elapsed time (without Greeks): 0.140319 sec

Profiling Information for GPU Processing:

Device 0 : Tesla T10 Processor
  Reduce Kernel     : 0.13262s

Device 1 : Tesla T10 Processor
  Reduce Kernel     evice 2 : Tesla T10 Processor
  Reduce Kernel     : 0.13462 s

Device 2 : Tesla T10 Processor
  Reduce Kernel     : 0.13322 s

Second Kernel...

v  =    224.06217957
Lb  =     21.34053040
Elapsed time (with Greeks): 1.558295 sec

Profiling Information for GPU Processing:

Device 0 : Tesla T10 Processor
  Reduce Kernel     : 0.51363 s

Device 1 : Tesla T10 Processor
  Reduce Kernel     : 0.51602 s

Device 2 : Tesla T10 Processor
  Reduce Kernel     : 0.51722 s


By the way, I am using Tesla C1060 and I modified my program according to the sample program "oclsimpleMultiGPU" provided in Nvidia OpenCL SDK sample codes. I am working under Linux. Thanks,

Outcomes