2 Replies Latest reply on Aug 30, 2010 9:13 AM by davibu

    [BUG] Serial execution of kernels with multi-gpus and Linux

    davibu

      I noticed a huge loose of performance in my app. while switching from SDK 2.1 to SDK 2.2 (and upgrading the Catalyst drivers too) on Linux.

      The problem seem to be related to the serial execution of kernels in the presence of multiple gpus.

      As proof of the problem, you can just run the SimpleMultiDevice included in the SDK 2.2. This is the result with my 5870+5850 on Linux Ubuntu 10.04 64bit (I have only increase the number of iterations and added a debug print of start/stop of the kernels):

      ----------------------------------------------------------
      Multi GPU Test 1 : Single context Single Thread
      ----------------------------------------------------------
      Start: 375330115610 Stop: 381202753127
      Start: 381218532958 Stop: 388832982740
      Total time : 13523
      Time of GPU0 : 5872.64
      Time of GPU1 : 7614.45
      ----------------------------------------------------------
      Multi GPU Test 2 : Multiple context Single Thread
      ----------------------------------------------------------
      Start: 389060081314 Stop: 394932619479
      Start: 394953741242 Stop: 402569237756
      Total time : 13529
      Time of GPU0 : 5872.54
      Time of GPU1 : 7615.5
      ----------------------------------------------------------
      Multi GPU Test 3 : Multiple context Multiple Thread
      ----------------------------------------------------------
      Start: 410429884147 Stop: 416302165479
      Start: 402796812668 Stop: 410411107713
      Total time : 13505
      Time of GPU0 : 5872.28
      Time of GPU1 : 7614.3

      As you see, the kernels execution seems to be serialized. The same test on the same PC but with Windows 7 64bit:

      ----------------------------------------------------------
      Multi GPU Test 1 : Single context Single Thread
      ----------------------------------------------------------
      Total time : 4527.8
      Time of GPU0 : 3467.32
      Time of GPU1 : 3840.05
      ----------------------------------------------------------
      Multi GPU Test 2 : Multiple context Single Thread
      ----------------------------------------------------------
      Total time : 4504.14
      Time of GPU0 : 3467.34
      Time of GPU1 : 3839.97
      ----------------------------------------------------------
      Multi GPU Test 3 : Multiple context Multiple Thread
      ----------------------------------------------------------
      Total time : 4505.19
      Time of GPU0 : 3468.59
      Time of GPU1 : 3840.09

      Not only the time to execute the kernel is shorter but the kernels run in parallel too.

      It looks like running an OpenCL kernel under Linux totally freeze the PC (no mouse response, no thread execution, etc.). My guess is a Linux kernel lock or something like that.


        • [BUG] Serial execution of kernels with multi-gpus and Linux
          genaganna

           

          Originally posted by: davibu I noticed a huge loose of performance in my app. while switching from SDK 2.1 to SDK 2.2 (and upgrading the Catalyst drivers too) on Linux.

           

          The problem seem to be related to the serial execution of kernels in the presence of multiple gpus.

           

          As proof of the problem, you can just run the SimpleMultiDevice included in the SDK 2.2. This is the result with my 5870+5850 on Linux Ubuntu 10.04 64bit (I have only increase the number of iterations and added a debug print of start/stop of the kernels):

           

          ---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Start: 375330115610 Stop: 381202753127 Start: 381218532958 Stop: 388832982740 Total time : 13523 Time of GPU0 : 5872.64 Time of GPU1 : 7614.45 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Start: 389060081314 Stop: 394932619479 Start: 394953741242 Stop: 402569237756 Total time : 13529 Time of GPU0 : 5872.54 Time of GPU1 : 7615.5 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Start: 410429884147 Stop: 416302165479 Start: 402796812668 Stop: 410411107713 Total time : 13505 Time of GPU0 : 5872.28 Time of GPU1 : 7614.3

           

          As you see, the kernels execution seems to be serialized. The same test on the same PC but with Windows 7 64bit:

           

          ---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 4527.8 Time of GPU0 : 3467.32 Time of GPU1 : 3840.05 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 4504.14 Time of GPU0 : 3467.34 Time of GPU1 : 3839.97 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 4505.19 Time of GPU0 : 3468.59 Time of GPU1 : 3840.09 Not only the time to execute the kernel is shorter but the kernels run in parallel too. It looks like running an OpenCL kernel under Linux totally freeze the PC (no mouse response, no thread execution, etc.). My guess is a Linux kernel lock or something like that.

          Are you using driver 10.8?  Thanks for reporting this issue.