12 Replies Latest reply on Oct 12, 2015 9:03 AM by tomer_gal

    Performance - GPU Power management

    tomer_gal

      Hi,

      My name is Tomer Gal, I am the CTO of a company named OpTeamizer which does OpenCL development/consultance for various companies.

       

      Now to the question...

      I assume the GPU does power management and therefore initially runs in lower frequency/voltage, this results in lower performance until a transition to high frequency/voltage is made.

      The goal is to be able to put the GPU in high performance mode and not suffer from the initial slowness.

       

      What is the API for controlling this?

       

      Regards,

      Tomer Gal

        • Re: Performance - GPU Power management

          Welcome! I have white listed you, and moved this into the OpenCL forum.

           

          pinform

          • Re: Performance - GPU Power management
            dipak

            Hi Tomer,

            Just to let you know that I've asked some experienced folks in this regard and trying to get relevant information from them. As soon as I get any, I'll share with you.

             

            Regards,

            • Re: Performance - GPU Power management
              velan

              Tomer,

               

                        Can you share the time window you observe for this low performance, in other words how long is the time window you see for the clock to switch from idle to required performance level.

               

                        Few have few internal tools which can set the GPU to fixed clock. But unfortunately this is not available for public. It might help if you get in touch with AMD field application engineer.

               

                        I have few recommendation,

                           - If you are using AMD CPU can you check in BIOS -> Chipset->GFX Configuration->PSPP Policy->Disabled, Note this would vary from OEM to OEM.

               

                           - Can you also explore with ADL SDK APIs (_PowerControl_Set(), _State_Set() _PerformanceLevels_Set()), I am still checking on support level for this API, will keep you posted. ADL SDK can be accessed from http://developer.amd.com/tools-and-sdks/graphics-development/display-library-adl-sdk/

               

              Thanks,

              Velan

                • Re: Performance - GPU Power management
                  tomer_gal

                  Hi Velan,

                  I will check the ADL SDK, expecting it to solve the problem.

                  As for the time window, it varies greatly. I have an OpenCL image processing filter made of many kernels. This filter normally runs at 9ms,

                  however - for the first time it gets run, the time could vary from 13ms to 120ms.

                  Examining this in CodeXL shows:

                  1. There are kernels which should normally run for few micro are running for a few ms.

                  2. There are bigger idle gaps between code which moves data from GPU to CPU and then back from CPU to GPU. If I remember correctly, my observation was that there is a gap between the time the API was invoked to until it was actually performed on the GPU, causing me to believe the commands are not being flushed.

                   

                  As this work is for the healthcare segment, it is not desired for any frame to go beyond the frame time budget, especially if this is a consistent behavior which occurs for the first frame and can cause the first frame to be more than x4 slower.

                   

                  Will use the ADL SDK and will report if it solved the problem,

                   

                  Thank you Velan!

                    • Re: Performance - GPU Power management
                      tomer_gal

                      Hi,

                      I've been trying to run the Overdrive_Sample which I think is the relevant sample to my needs.

                      Getting the following error:

                      Cannot find active AMD adapter

                       

                      The setup is:

                      Windows 8 64 bit

                      AMD W7000

                      Using latest AMD drivers.

                      Intel CPU with HD Graphics

                       

                      When debugging, it finds the AMD adapter but it is not active.

                      What is the reason? Is it possible that this happens due to the usage of the Intel HD Graphics? If so, should the solution be to ignore the fact that the AMD adapter is not active?

                       

                      Regards,

                      Tomer Gal

                        • Re: Performance - GPU Power management
                          tomer_gal

                          Tried ignoring the adapterActive field. This is not valid because the next error would be: Can't get Overdrive capabilities

                          As a side note: I also tried running as admin and tried running in Windows 7 compatibility mode.

                          It doesn't work, so I guess AMD support is required on this issue

                            • Re: Performance - GPU Power management
                              velan

                              Tomer,

                                        Based on your observation using CodeXL and varying run time up to 120ms i feel that this is nothing to do with GPU clock warmup issue. I confirmed that when OpenCL application is run, GPU jumps to high performance quickly, 13ms to 120ms is huge number here.

                                       

                                        I feel this is something specific to OpenCL, i am not openCL expert, please stay tuned while we get OpenCL expert to answer here.

                               

                                        On a minor note of ADL failure could be due to display is not connected to AMD GPU.

                              Thanks

                              Velan

                                • Re: Performance - GPU Power management
                                  dipak

                                  Thanks velan for sharing the information.

                                   

                                  Hi Tomer,

                                  As Velan suspects that the varying run time issue may be related to OpenCL rather than GPU clock warmup performance, could you please provide a reproducible test-case that manifests the same problem?

                                   

                                  Regards,

                                    • Re: Performance - GPU Power management
                                      tomer_gal

                                      Hi dipak,

                                      First, regarding the Overdrive_Sample, after I attached the AMD GPU as display, it is recognized as adapterActive but still states: Can't get Overdrive capabilities

                                      As for a reproducible test-case of the OpenCL code, I can't due to it being intellectual property of my customers.

                                      However, I am attaching a screenshot of CodeXL, showing the issue. As you can see, for the first 3 frames, the first frame is slower than each of the other 2 frames.

                                      The first frame is slower due to 2 reasons: Kernels consuming more time and larger idle times where flow breaks due to CPU-->GPU or GPU-->CPU data transfers.

                                       

                                      If you examine the summary, you can see that sum_reduction64 consumes at maximum 12ms. After the first frame the same kernel runs at a few microseconds.

                                       

                                       

                                      Warmpup.jpg

                                        • Re: Performance - GPU Power management
                                          dipak

                                          Hi Tomer,

                                           

                                          Thanks for sharing the timing information. Yes, I can see the large difference in execution time.

                                           

                                          As for a reproducible test-case of the OpenCL code, I can't due to it being intellectual property of my customers.

                                          I can understand. However, could you please share the host-side code flow (i.e. sequence of APIs and other works, kernels launching etc.) so that we can reproduce something similar at our end?

                                          [For privacy, if you want, I can share my official email address where you can send the code or any other details.]

                                           

                                          Interestingly, another user has also reported similar inconsistency on this thread Kernel Timing Anomalies.

                                          [These two issues may not be related, however, just check that thread once.]

                                           

                                          BTW, I've a question. If a dummy kernel is executed before starting the actual processing, do you observe any improvement for the first run? Could you please check and share your observation?

                                           

                                          First, regarding the Overdrive_Sample, after I attached the AMD GPU as display, it is recognized as adapterActive but still states: Can't get Overdrive capabilities

                                          Sorry, I'm not aware of this. You may expect a separate reply from velan or someone else from ADL team.

                                           

                                           

                                          Regards,

                            • Re: Performance - GPU Power management
                              tzachi.cohen

                              Hi Tomer !

                               

                              Bear in mind that keeping steady clock has its dis-advantages:

                              1.) When the GPU stays constantly at high clocks it also consumes its maximum TDP. Will it be acceptable for the medical device if the Pitcairn GPU alone consumes 130W all the time instead of 10W on idle? What about noise?

                               

                              2.) Even with fixed high clocks, the first launch will always be slower because none of the data is cached yet . (at least the ISA cache and the constant cache.)

                               

                              Tzachi

                                • Re: Performance - GPU Power management
                                  tomer_gal

                                  Hi Tzachi

                                  Yes, I'm aware of that.

                                  I found one cause of the performance difference, will start a new thread for it and should open a bug as well.

                                   

                                  Usually for medical devices there is the idle time where nothing is being done, that's the time to set the power settings to default.

                                  Then, there is the processing time, a specific duration in which a patient is being scanned, that's the time where performance matters and everything is expected to be steady.

                                   

                                  Anyway, as I said, the sample for controlling the power policy doesn't work.

                                  Dipak, I can't send a reproducible test case because this is a very large software which also wraps all of the OpenCL code, so it would have consumed a lot of time to strip everything to something which I could send.

                                   

                                  Regards,

                                  Tomer Gal