cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

c0nfig
Journeyman III

GPU tasks performance issue when screen goes into standby mode

Hi,

I have an OpenCL programs that runs about 25% faster when the screen goes into standby mode.

At the same time the compiz process takes 100% cpu usage. This is a known bug that is  discussed in here : https://bugs.launchpad.net/ubuntu/+source/compiz/+bug/969860

That site suggest a workaround (enable "Force full screen redraw (buffer swap) on repaint") which fixes the 100% cpu usage issue, but the performance now stays low even when the screen goes into standby mode.

This behavior have been happened to me on ubuntu 12.04 with Radeon 6970 and 7970 with 2.7 sdk 12-4 and 12-6 drivers.

I guess that getting better performance when the screen goes into standby mode is reasonable, but right now to achieve this I  must "pay" with one core of 100% usage.

I can't tell if this a general GPU drivers issue or an OpenCL one.

0 Likes
25 Replies
yurtesen
Miniboss

If you are running OpenCL apps remotely, did you try without logging in to ubuntu? You should be able to run OpenCL apps when the computer is running at login prompt. Try and let us know the results:

http://devgurus.amd.com/message/1282136#1282136

0 Likes

Thanks , I used that info that you wrote in the other post.

I run the app without login and the performance is still slower than login and let the screen to go into standby.

0 Likes

It is possible that when in standby the card does not exit power saving mode. Did you try to check it with aticontrol program to check current speeds when running your program to verify this?

How long is your test case? It is also possible that the time it takes for the card to switch from low power to performance mode might be where you are loosing that 20--25% performance. Perhaps when compiz runs, it forces card to be on performance mode at all times (you can confirm this with aticontrol getclocks as well).

0 Likes

I used aticonfig --odgc to see the current peak clock.

From what I saw, the current peak clock is always equal to 925MHz which is the maximum clock in my settings for this card.

I notice that the GPU load is getting higher in the standby mode (from about 40% to 55%), which is maybe caused by the compiz process + my program.

Each test case duration is taken for few minutes and the performance is steady in the standby or the "awake" mode .

I tried to run samples from the SDK to see if it happens also there.

The only difference I've found so far is in BufferBandwidth on one line :

when awake:

Page fault       1670.84 ns

when standby:

Page fault       837.94  ns

It doesn't tell me allot, but maybe it has something to do with my issue.



0 Likes
Wenju
Elite

Hi c0nfig,

I think you should have a test on Windows. And then to confirm it's a driver issue or program issue.

0 Likes

Hi Wenju,

I tested it on windows, the performance didn't change when the screen went into standby.

Moreover, my program runs faster in windows (about 20% more than the fastest I could get using the linux version)

0 Likes

In Linux, if you return the cpu usage to 100%, what's the result? Faster again?

0 Likes

I'll use some numbers to ease this discussion.

I doesn't matter if I start the program when the screen is off ( through SSH) or directly from the computer with the screen on.

Moreover, during these tests,  every few minutes I moved the mouse to awake the screen and then let it goes off again (after 5 minutes).

Ok, so these are the two cases:

  • Without the compiz fix, the compiz process takes 100% cpu when the screen is off. When the screen is on my program performance is 14000 steps per minute. when it goes off the performance raises to about 21000 steps per minute.
  • With the compiz fix, the usage of the compiz process  is negligible.  When the screen is on or off the performance is about 14000.
0 Likes

Hi c0nfig,

I think you should do another test. Close compiz process and run your program.

So far I just speculting: when the screen is off, the gpu will be idle. And it's a bug that the compiz process takes 100% cpu, so at this moment, compiz doesn't perform very good, just like 3d render. When the screen goes into standby mode, compiz performs much worse or maybe it doesn't work. So there has a lot of gpu resource to your program. But on the other hand, when you fix it, compiz perfoms nornally even though in the standby mode.

0 Likes

Hi,

As I previously replied to yurtesen, I  ran via ssh with login in to any user ( I will recheck but  I think that means that compiz was close). The results were  the same slower performance (with either screen on or off) , just like with compiz  and the compiz fix.

For some reason that compiz bug make the performance higher in standby mode.

I'll try  to write a simple kernel so you'll be able to see it if it also happens on your systems.

0 Likes

If you can provide some test code, I can try to run it. What I was thinking is that the cards going to power saving mode or switching in between and loosing performance You already said the current peak was 925, but important part is 'Current Clocks' part (did you mean that?). Perhaps it might make sense to see if 'Current Clocks' show max speed when compiz is using 100% CPU (to see if compiz causing card to stay on high speed mode).

I dont know if it is possible, but you can maybe try to force clocks to stay high at all times? I am not sure if CCC would show you anything under Linux there... But I found this utility which you might be able to try...

http://manpages.ubuntu.com/manpages/hardy/man8/rovclock.8.html

I am just fishing here...

0 Likes

Hi,

I've prepared a code sample. I simply took a vector addition example and changed the kernel to be more time consuming so the bottleneck won't be the CPU.

The program starts the kernel and calls a blocking buffer copy and repeats in an infinite loop. The program prints the number of steps every 10 seconds.

On my computer when compiz is "fine" I get about 3900 steps per 10 seconds.

During the same run, after a minute the screen goes into standby and when compiz starts to go wild(about 30 seconds after the screen is in standby) I get 4800 steps per 10 seconds.

aticonfig during the all test showed both current peak and current clock @ 925MHz.

I can set the MHz for the cores and  for the memory using aticonfig overdrive utility. I tried to play with this, it changed the performance of course but the behavior remained.

rovclock actually forces the frequency rates ? or it is like  the AMD overdrive ?

BTW, yurtesen you were right, In that message I meant both current peak and current clock.

0 Likes

c0nfig wrote:

rovclock actually forces the frequency rates ? or it is like  the AMD overdrive ?

I dont know, my hope was that it would force the card to stay at high speed even when idling. If you are able to run on windows, MSI afterburner unofficial overclocking mode (2) does this for Tahiti cards at least. It is able to disable powerplay (or it appears it takes over it). But you already said it runs fine on windows so... you might have to find an utility which does the same on linux.

PS. I cut my finger today so I am barely able to type... I wont be able to test your code right away taking it easy for a little while...

0 Likes

oh sorry to hear that, resting your finger is absolutely a better idea.

I don't have this issue on windows, so I'll just try to use rovclock and post an update.

0 Likes

I just downloaded your test files and will try to run them soon. How did it go with rovclock?

0 Likes

Well, I am getting something like this on a 5870 as output

$ ./a.out

steps per 10 secs : 5012

steps per 10 secs : 5070

steps per 10 secs : 5066

steps per 10 secs : 5081

steps per 10 secs : 5080

6x 5000 = 30000 ? Tomorrow I can try to run it on a 7970 I guess... It seems unlikely that 5870 would beat 6970 or 7970. The next problem is that the 7970 which I have to use is clocked at 1010mhz at performance mode... I guess I will be able to tell if I am having the same problem once I run the program from console directly.

I checked with top and I have roughly 10% CPU usage on your program only. I tested it without logging in and the screen was at login screen (ubuntu 12.04, app sdk 2.7 and catalyst 12.6 driver and I am not sure if it goes to standby, I just installed ubuntu to that box). I will try it tomorrow from console directly also.

Also, just out of curiosity, why didnt you use vector elements in your kernel if this is a vector addition example? (just for fun I tried float4 and it doubles my performance to ~11800 elements per 10 seconds on 5870).

0 Likes

Well, when at console, it appears the performance depends on if there are movements on display or not. I am able to get consistent speeds if nothing is moving on screen. Perhaps X is monopolizing the card?

$ ./a.out

steps per 10 secs : 4569

steps per 10 secs : 5042

steps per 10 secs : 5038

steps per 10 secs : 5035

steps per 10 secs : 5039

steps per 10 secs : 5028

steps per 10 secs : 5037

0 Likes

Do you happen to have that bug that Compiz goes to 100% CPU on you ubuntu 12.04 ?

Is this last result is on your 7970 @ 1010 Mhz ?

I tried to use rovclock, but it kept throwing the error "Invalid reference clock from BIOS: 0.0 MHz" on any operation I tried, though it did found ATI card (prints "Found ATI card on 01:00 ...").

Thanks for the help.

0 Likes

c0nfig wrote:

Do you happen to have that bug that Compiz goes to 100% CPU on you ubuntu 12.04 ?

Is this last result is on your 7970 @ 1010 Mhz ?

I tried to use rovclock, but it kept throwing the error "Invalid reference clock from BIOS: 0.0 MHz" on any operation I tried, though it did found ATI card (prints "Found ATI card on 01:00 ...").

Thanks for the help.

I ran your program only on 5870 with Ubuntu 12.04. When I tested it, I was not logged in. Therefore the compiz was not using any GPU. But I have the same compiz problem, if I leave a user logged in, then compiz is using a single core 100%. (but I didnt run your program when compiz was running).

Unfortunately I didnt have time to get to 7970, there were some hardware problems in that machine (it will be fixed soon), and it is running Ubuntu 11 (I just remembered now). If you want I can run it on 7970 at some point when I get it up and running.?

From what I see in my 5870, there does not seem to be a problem with the clocks, sometimes when I run your program, the first iteration was little slower but the other ones were quite stable and ~5000 steps/10sec. But of course maybe this problem effects only some cards (but then I would expect 5000 and 6000 series to perform more or less similarly).

0 Likes

I am getting exactly same behavior even when compiz is 100% (I had first iteration little slower sometimes in my previous tests also) (this is on 5870, 850mhz GPU /1200mhz GDDR5)

# ./a.out

steps per 10 secs : 4638

steps per 10 secs : 5061

steps per 10 secs : 5058

steps per 10 secs : 5062

steps per 10 secs : 5064

steps per 10 secs : 5065

^C

top - 01:12:20 up  4:48,  4 users,  load average: 1.37, 1.37, 1.44

Tasks: 199 total,   2 running, 196 sleeping,   0 stopped,   1 zombie

Cpu(s):  2.4%us, 11.9%sy,  0.0%ni, 85.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

Mem:  16435264k total,  1392672k used, 15042592k free,    35116k buffers

Swap: 16775164k total,        0k used, 16775164k free,   431780k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND

4066 supremum  20   0 1284m  90m  44m R  100  0.6 147:32.48 compiz

7698 root      20   0  174m  53m  22m S   10  0.3   0:01.83 a.out

0 Likes

Hi,

Thanks for your test.

Hmm, so maybe it depends on the card,

I'll install a clean Ubuntu to test this program again.

Also, the fact that 5870 outperform my 7970 is suspicious.

Just to be sure, please tell me if you are you not using  the latest version of the APP SDK and the graphics drivers.

0 Likes

I have the box with 7970 up and running, I will soon return back with some numbers. I am using SDK 2.7, and 12.6 drivers (I mentioned it earlier). Actually I just installed ubuntu 12.04 from scratch to this box.

0 Likes

Hmm, right something is strange here... GPU load shows 0%

Adapter 0 - AMD Radeon HD 7900 Series

                            Core (MHz)    Memory (MHz)

           Current Clocks :    300           150

             Current Peak :    1010           1375

  Configurable Peak Range : [300-1125]     [150-1575]

                 GPU load :    0%

and the performance is terrible...

$ ./a.out

steps per 10 secs : 1448

steps per 10 secs : 1465

steps per 10 secs : 1468

steps per 10 secs : 1465

Anyway, there is a problem in your loop also. You are not waiting for kernel execution to finish before running the enqueueread? I get 50% better performance if I put a clfinish between enqueue kernel and enqueue read statements. But that is not very efficient... (on 7970, it now uses 50% of the card with clfinish, you should find a better solution ...)

on the other hand, if I put clFinish on 5870, there is no difference in execution....

0 Likes

Clfinish is necessary ? The command queue keeps the order of the clenqueue commands and the clEnqueueReadBuffer is blocking. Am I missing something here ?

0 Likes

Yes, when you enqueue a kernel, the host program will continue and run the readbuffer command (which will try to read data from where your kernel is working on in a blocked fashion). Because the enqueue kernel command is not blocking. You should use events to keep track of kernel execution and try not to read/write to memory areas which are used by the kernel while it is executing (obviously). I think I am right, but double check from the manual

0 Likes