cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

tomer_gal
Adept I

Performance - GPU Power management

Hi,

My name is Tomer Gal, I am the CTO of a company named OpTeamizer which does OpenCL development/consultance for various companies.

Now to the question...

I assume the GPU does power management and therefore initially runs in lower frequency/voltage, this results in lower performance until a transition to high frequency/voltage is made.

The goal is to be able to put the GPU in high performance mode and not suffer from the initial slowness.

What is the API for controlling this?

Regards,

Tomer Gal

0 Likes
12 Replies
jtrudeau
Staff

Welcome! I have white listed you, and moved this into the OpenCL forum.

pinform

0 Likes
dipak
Big Boss

Hi Tomer,

Just to let you know that I've asked some experienced folks in this regard and trying to get relevant information from them. As soon as I get any, I'll share with you.

Regards,

0 Likes
velan
Staff

Tomer,

          Can you share the time window you observe for this low performance, in other words how long is the time window you see for the clock to switch from idle to required performance level.

          Few have few internal tools which can set the GPU to fixed clock. But unfortunately this is not available for public. It might help if you get in touch with AMD field application engineer.

          I have few recommendation,

             - If you are using AMD CPU can you check in BIOS -> Chipset->GFX Configuration->PSPP Policy->Disabled, Note this would vary from OEM to OEM.

             - Can you also explore with ADL SDK APIs (_PowerControl_Set(), _State_Set() _PerformanceLevels_Set()), I am still checking on support level for this API, will keep you posted. ADL SDK can be accessed from http://developer.amd.com/tools-and-sdks/graphics-development/display-library-adl-sdk/

Thanks,

Velan

0 Likes

Hi Velan,

I will check the ADL SDK, expecting it to solve the problem.

As for the time window, it varies greatly. I have an OpenCL image processing filter made of many kernels. This filter normally runs at 9ms,

however - for the first time it gets run, the time could vary from 13ms to 120ms.

Examining this in CodeXL shows:

1. There are kernels which should normally run for few micro are running for a few ms.

2. There are bigger idle gaps between code which moves data from GPU to CPU and then back from CPU to GPU. If I remember correctly, my observation was that there is a gap between the time the API was invoked to until it was actually performed on the GPU, causing me to believe the commands are not being flushed.

As this work is for the healthcare segment, it is not desired for any frame to go beyond the frame time budget, especially if this is a consistent behavior which occurs for the first frame and can cause the first frame to be more than x4 slower.

Will use the ADL SDK and will report if it solved the problem,

Thank you Velan!

0 Likes

Hi,

I've been trying to run the Overdrive_Sample which I think is the relevant sample to my needs.

Getting the following error:

Cannot find active AMD adapter

The setup is:

Windows 8 64 bit

AMD W7000

Using latest AMD drivers.

Intel CPU with HD Graphics

When debugging, it finds the AMD adapter but it is not active.

What is the reason? Is it possible that this happens due to the usage of the Intel HD Graphics? If so, should the solution be to ignore the fact that the AMD adapter is not active?

Regards,

Tomer Gal

0 Likes

Tried ignoring the adapterActive field. This is not valid because the next error would be: Can't get Overdrive capabilities

As a side note: I also tried running as admin and tried running in Windows 7 compatibility mode.

It doesn't work, so I guess AMD support is required on this issue

0 Likes

Tomer,

          Based on your observation using CodeXL and varying run time up to 120ms i feel that this is nothing to do with GPU clock warmup issue. I confirmed that when OpenCL application is run, GPU jumps to high performance quickly, 13ms to 120ms is huge number here.

         

          I feel this is something specific to OpenCL, i am not openCL expert, please stay tuned while we get OpenCL expert to answer here.

          On a minor note of ADL failure could be due to display is not connected to AMD GPU.

Thanks

Velan

0 Likes

Thanks velan​ for sharing the information.

Hi Tomer,

As Velan suspects that the varying run time issue may be related to OpenCL rather than GPU clock warmup performance, could you please provide a reproducible test-case that manifests the same problem?

Regards,

0 Likes

Hi dipak,

First, regarding the Overdrive_Sample, after I attached the AMD GPU as display, it is recognized as adapterActive but still states: Can't get Overdrive capabilities

As for a reproducible test-case of the OpenCL code, I can't due to it being intellectual property of my customers.

However, I am attaching a screenshot of CodeXL, showing the issue. As you can see, for the first 3 frames, the first frame is slower than each of the other 2 frames.

The first frame is slower due to 2 reasons: Kernels consuming more time and larger idle times where flow breaks due to CPU-->GPU or GPU-->CPU data transfers.

If you examine the summary, you can see that sum_reduction64 consumes at maximum 12ms. After the first frame the same kernel runs at a few microseconds.

Warmpup.jpg

0 Likes

Hi Tomer,

Thanks for sharing the timing information. Yes, I can see the large difference in execution time.

As for a reproducible test-case of the OpenCL code, I can't due to it being intellectual property of my customers.

I can understand. However, could you please share the host-side code flow (i.e. sequence of APIs and other works, kernels launching etc.) so that we can reproduce something similar at our end?

[For privacy, if you want, I can share my official email address where you can send the code or any other details.]

Interestingly, another user has also reported similar inconsistency on this thread Kernel Timing Anomalies

[These two issues may not be related, however, just check that thread once.]

BTW, I've a question. If a dummy kernel is executed before starting the actual processing, do you observe any improvement for the first run? Could you please check and share your observation?

First, regarding the Overdrive_Sample, after I attached the AMD GPU as display, it is recognized as adapterActive but still states: Can't get Overdrive capabilities

Sorry, I'm not aware of this. You may expect a separate reply from velan​ or someone else from ADL team.

Regards,

0 Likes

Hi Tomer !

Bear in mind that keeping steady clock has its dis-advantages:

1.) When the GPU stays constantly at high clocks it also consumes its maximum TDP. Will it be acceptable for the medical device if the Pitcairn GPU alone consumes 130W all the time instead of 10W on idle? What about noise?

2.) Even with fixed high clocks, the first launch will always be slower because none of the data is cached yet . (at least the ISA cache and the constant cache.)

Tzachi

0 Likes

Hi Tzachi

Yes, I'm aware of that.

I found one cause of the performance difference, will start a new thread for it and should open a bug as well.

Usually for medical devices there is the idle time where nothing is being done, that's the time to set the power settings to default.

Then, there is the processing time, a specific duration in which a patient is being scanned, that's the time where performance matters and everything is expected to be steady.

Anyway, as I said, the sample for controlling the power policy doesn't work.

Dipak, I can't send a reproducible test case because this is a very large software which also wraps all of the OpenCL code, so it would have consumed a lot of time to strip everything to something which I could send.

Regards,

Tomer Gal

0 Likes