davibu

[BUG] Serial execution of kernels with multi-gpus and Linux

Discussion created by davibu on Aug 30, 2010
Latest reply on Aug 30, 2010 by davibu

I noticed a huge loose of performance in my app. while switching from SDK 2.1 to SDK 2.2 (and upgrading the Catalyst drivers too) on Linux.

The problem seem to be related to the serial execution of kernels in the presence of multiple gpus.

As proof of the problem, you can just run the SimpleMultiDevice included in the SDK 2.2. This is the result with my 5870+5850 on Linux Ubuntu 10.04 64bit (I have only increase the number of iterations and added a debug print of start/stop of the kernels):

----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Start: 375330115610 Stop: 381202753127
Start: 381218532958 Stop: 388832982740
Total time : 13523
Time of GPU0 : 5872.64
Time of GPU1 : 7614.45
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Start: 389060081314 Stop: 394932619479
Start: 394953741242 Stop: 402569237756
Total time : 13529
Time of GPU0 : 5872.54
Time of GPU1 : 7615.5
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Start: 410429884147 Stop: 416302165479
Start: 402796812668 Stop: 410411107713
Total time : 13505
Time of GPU0 : 5872.28
Time of GPU1 : 7614.3

As you see, the kernels execution seems to be serialized. The same test on the same PC but with Windows 7 64bit:

----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 4527.8
Time of GPU0 : 3467.32
Time of GPU1 : 3840.05
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 4504.14
Time of GPU0 : 3467.34
Time of GPU1 : 3839.97
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 4505.19
Time of GPU0 : 3468.59
Time of GPU1 : 3840.09

Not only the time to execute the kernel is shorter but the kernels run in parallel too.

It looks like running an OpenCL kernel under Linux totally freeze the PC (no mouse response, no thread execution, etc.). My guess is a Linux kernel lock or something like that.


Outcomes