cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

davibu
Journeyman III

[BUG] Serial execution of kernels with multi-gpus and Linux

I noticed a huge loose of performance in my app. while switching from SDK 2.1 to SDK 2.2 (and upgrading the Catalyst drivers too) on Linux.

The problem seem to be related to the serial execution of kernels in the presence of multiple gpus.

As proof of the problem, you can just run the SimpleMultiDevice included in the SDK 2.2. This is the result with my 5870+5850 on Linux Ubuntu 10.04 64bit (I have only increase the number of iterations and added a debug print of start/stop of the kernels):

----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Start: 375330115610 Stop: 381202753127
Start: 381218532958 Stop: 388832982740
Total time : 13523
Time of GPU0 : 5872.64
Time of GPU1 : 7614.45
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Start: 389060081314 Stop: 394932619479
Start: 394953741242 Stop: 402569237756
Total time : 13529
Time of GPU0 : 5872.54
Time of GPU1 : 7615.5
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Start: 410429884147 Stop: 416302165479
Start: 402796812668 Stop: 410411107713
Total time : 13505
Time of GPU0 : 5872.28
Time of GPU1 : 7614.3

As you see, the kernels execution seems to be serialized. The same test on the same PC but with Windows 7 64bit:

----------------------------------------------------------
Multi GPU Test 1 : Single context Single Thread
----------------------------------------------------------
Total time : 4527.8
Time of GPU0 : 3467.32
Time of GPU1 : 3840.05
----------------------------------------------------------
Multi GPU Test 2 : Multiple context Single Thread
----------------------------------------------------------
Total time : 4504.14
Time of GPU0 : 3467.34
Time of GPU1 : 3839.97
----------------------------------------------------------
Multi GPU Test 3 : Multiple context Multiple Thread
----------------------------------------------------------
Total time : 4505.19
Time of GPU0 : 3468.59
Time of GPU1 : 3840.09

Not only the time to execute the kernel is shorter but the kernels run in parallel too.

It looks like running an OpenCL kernel under Linux totally freeze the PC (no mouse response, no thread execution, etc.). My guess is a Linux kernel lock or something like that.


0 Likes
2 Replies
genaganna
Journeyman III

Originally posted by: davibu I noticed a huge loose of performance in my app. while switching from SDK 2.1 to SDK 2.2 (and upgrading the Catalyst drivers too) on Linux.

 

The problem seem to be related to the serial execution of kernels in the presence of multiple gpus.

 

As proof of the problem, you can just run the SimpleMultiDevice included in the SDK 2.2. This is the result with my 5870+5850 on Linux Ubuntu 10.04 64bit (I have only increase the number of iterations and added a debug print of start/stop of the kernels):

 

---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Start: 375330115610 Stop: 381202753127 Start: 381218532958 Stop: 388832982740 Total time : 13523 Time of GPU0 : 5872.64 Time of GPU1 : 7614.45 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Start: 389060081314 Stop: 394932619479 Start: 394953741242 Stop: 402569237756 Total time : 13529 Time of GPU0 : 5872.54 Time of GPU1 : 7615.5 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Start: 410429884147 Stop: 416302165479 Start: 402796812668 Stop: 410411107713 Total time : 13505 Time of GPU0 : 5872.28 Time of GPU1 : 7614.3

 

As you see, the kernels execution seems to be serialized. The same test on the same PC but with Windows 7 64bit:

 

---------------------------------------------------------- Multi GPU Test 1 : Single context Single Thread ---------------------------------------------------------- Total time : 4527.8 Time of GPU0 : 3467.32 Time of GPU1 : 3840.05 ---------------------------------------------------------- Multi GPU Test 2 : Multiple context Single Thread ---------------------------------------------------------- Total time : 4504.14 Time of GPU0 : 3467.34 Time of GPU1 : 3839.97 ---------------------------------------------------------- Multi GPU Test 3 : Multiple context Multiple Thread ---------------------------------------------------------- Total time : 4505.19 Time of GPU0 : 3468.59 Time of GPU1 : 3840.09 Not only the time to execute the kernel is shorter but the kernels run in parallel too. It looks like running an OpenCL kernel under Linux totally freeze the PC (no mouse response, no thread execution, etc.). My guess is a Linux kernel lock or something like that.

Are you using driver 10.8?  Thanks for reporting this issue.

0 Likes

Originally posted by: genaganna

Are you using driver 10.8?  Thanks for reporting this issue.

 

 

Yes, 10.7b doesn't work at all with multiple GPUs: the Xserver fires a memory fault as soon as you run an OpenCL app (instead it works if only one GPU is configured).

 

0 Likes