3 Replies Latest reply on Sep 1, 2010 6:18 AM by edwen

    multiple GPUs problem

    edwen
      The GPUs stop working in parallel while the kernel gets more complicated

      There are two kernels in my program, with the second one more complicated than the first one. I am trying to use 3 GPUs in parallel execution. From the results, the first kernal runs normally, with the total time slightly longer than that of each individual GPU. However, the total execution time of the second kernel is longer than the sum of all 3 GPUs. Can anyone help explain what may cause the problem? Here are the results:

      First Kernel...

      v  =    224.06217957
      Elapsed time (without Greeks): 0.140319 sec

      Profiling Information for GPU Processing:

      Device 0 : Tesla T10 Processor
        Reduce Kernel     : 0.13262s

      Device 1 : Tesla T10 Processor
        Reduce Kernel     evice 2 : Tesla T10 Processor
        Reduce Kernel     : 0.13462 s

      Device 2 : Tesla T10 Processor
        Reduce Kernel     : 0.13322 s

      Second Kernel...

      v  =    224.06217957
      Lb  =     21.34053040
      Elapsed time (with Greeks): 1.558295 sec

      Profiling Information for GPU Processing:

      Device 0 : Tesla T10 Processor
        Reduce Kernel     : 0.51363 s

      Device 1 : Tesla T10 Processor
        Reduce Kernel     : 0.51602 s

      Device 2 : Tesla T10 Processor
        Reduce Kernel     : 0.51722 s


      By the way, I am using Tesla C1060 and I modified my program according to the sample program "oclsimpleMultiGPU" provided in Nvidia OpenCL SDK sample codes. I am working under Linux. Thanks,