10 Replies Latest reply on May 15, 2012 6:36 AM by Raistmer

    AMD APP Profiler 2.4 is now available

    lbin

      The AMD APP Profiler is a performance analysis tool that gathers data from the OpenCL™ run-time and AMD Radeon™ GPUs during the execution of an OpenCL™ application. We can then use this information to discover bottlenecks in an application and find ways to optimize the application’s performance for AMD platforms.

       

      New updates in this version include

       

      • Support for AMD APP SDK v2.6.
      • Added a kernel occupancy analyzer, which calculates and displays a kernel occupancy number estimating the number of in-flight wavefronts on a compute unit as a percentage of the theoretical maximum number of wavefronts that the compute unit can support
      • Added support for collecting symbol information when collecting an application trace, allowing navigation from the API Trace view to the source code that called an API
      • Improved OpenCL™ analysis module:
        • Added detection of non-optimized data transfer operations
        • Added detection of redundant synchronization operations
        • Improved detection of unnecessary blocking write operations
        • Improved analysis in multithreaded applications (fixed false positives)
      • Added support for specifying which OpenCL™ APIs will be traced
      • Added ability to rename sessions in the Session Explorer Window
      • Added ability to automatically delete profiler sessions when closing a Microsoft® Visual Studio® solution
      • Added support for modifying the parameters used to initiate a profiler session
      • Added support for multiple-GPU systems when collecting performance counters
      • Improved the CLPerfMarkerAMD library
      • Improved performance when using timeout mode
      • In the session window, "GPRs" column has been renamed "VGPRs" (vector GPRs)
      • Fixed a problem with loading saved counters from a file
      • Fixed a problem where the performance counter values for some kernel dispatch operations were reported as all zeros
      • Fixed a problem with missing GPU timestamps in an application trace when enabling the "Write trace data periodically during program execution" option
      • Removed Data Transfer data from the Session view for OpenCL™ applications.  It is recommended that you use the Application Trace view to get information on data transfers
      • Preview: Support for profiling with AMD Radeon™ HD7000 series GPUs (requires AMD APP SDK v2.6 and an AMD Catalyst version that supports this hardware)

       

       

      Please post your feedback here.

        • Re: AMD APP Profiler 2.4 is now available
          chesik

          For reference, the latest version of APP Profiler can be downloaded from the product page at:  http://developer.amd.com/tools/AMDAPPProfiler/Pages/default.aspx.

           

          Chris

          • Re: AMD APP Profiler 2.4 is now available
            Raistmer

            I used APP Profiler 2.4 for some time on 2 different hosts and now can give some feedback on it.

            1)

            New updates in this version include

             

            • Removed Data Transfer data from the Session view for OpenCL™ applications.  It is recommended that you use the Application Trace view to get information on data transfers

             

            Please post your feedback here.

             

            Please, return data about memory transfers in csv file !!!

            I use mostly sprofile command-line version and the lack of buffer transfers data gives very negative impression.

            Maybe I will get this data via --apitrace run but to do separate run just for this means to do double manual work.

            That is, it's very inconvenient. You could add some switch to command-line based version to include/not include memory transfers profiling data in counters csv file if for some unknown reason you think that lack of this data is "cool" feature.

             

            2) Why I restricted mostly command-line based version: Visual Studio plugin still can't work with ICC projects.

            So, Intel ICC based project can't be profiled inside Visual Studio. And I checked, MSVC based project can be profiled just in same solution so I think my setup is OK, it's just lack of ICC support... But if one really care about performance he will go with ICC for host code... Please, don't forget to improve ICC compatibility in next version.

             

            3) On one of hosts I have both VS2005 and VS2008 installed. On this host APP Profiler plugin refuses to work in VS2008.

            I tried reinstalls... it just doesn't work. So, again, only command-line based sprofile will go...

            On host with VS2008 only plugin works.

             

            4) My MSVC based project has $(OutDir)/app_name.exe in Output File property.

            APP Profiler MSVC plugin can't find executable. It hangs for some time then reports about unability to find executable. Then I manually input correct path and it starts profiling run. And the most annoying thing - it can't remember already supplied path ! So I should manually point to executable again and again even in the same profiling session. Think this interface bug could be easely fixed in next version.

             

            Besides listed problems APP Profiler is very useful tool, hope it will be made even better in next release.

              • Re: AMD APP Profiler 2.4 is now available
                chesik

                Hi Raistmer,

                 

                We had received reports that the timing data shown in the .csv file for data transfer operations was inaccurate.  Upon investigation, we discovered that the data could in fact be incorrect.  This is a side effect of how the collection of performance counters works.  In order to get all of the performance counters, the profiler must replay (re-dispatch) each kernel multiple times (there are limits on the number of performance counters the profiler can query from the hardware for each kernel dispatch).  The profiler must introduce additional data transfers between the host and the device to make sure that the each replayed kernel dispatch operates on the same data.  Investigation showed that these additional memory transfers could affect the timing results of the original data transfer operations.

                 

                In those cases, the data transfer information shown when doing an apitrace profile is correct (since the profiler doesn't replay any kernel dispatches when collecting a trace).  So it was decided to remove the potentially misleading data from the .csv file and point users toward the correct data in the .atp file.

                 

                Regarding your other points:

                2)  I will add an item to our internal issue tracking system to investigate using the profiler with a project that uses the Intel ICC compiler.

                3)  I will add an item to investigate this issue as well.  Can you let me know what the symptoms are that you see?  Does the profiler give an error message in this case?

                4) Does your project have a "Command" property set on the "Debugging" page?  The profiler should use this setting to figure out which .exe to launch when you start a profiling session.  I agree that the profiler should remember the overridden setting -- that sounds like a bug that it's not doing that.  We will investigate.

                 

                Thanks for your feedback,

                Chris

                  • Re: AMD APP Profiler 2.4 is now available
                    Raistmer

                    Hi Chris,

                    thanks for explanation.

                    1) 

                    Unfortunately, I can't use trace mode at all with sprofile 2.4 It always results in "failed to generate profile result" message...

                    I asked one of my testers to get trace data on his host but he failed with same error too.

                    csv file with counters data is generated w/o any problems on our hosts...

                    What could we try to get trace data with command line profiler?

                     

                    Regarding other topics:

                    3)

                    VS2005/VS2008 installation: no it doesn't show any errors.

                    It's just impossible to open APP Profiler session explorer tab and all other its tabs.

                     

                    They are listed under View->Other Windows but clicking on any of APP Profiler windows in list results in nothing. No new window appears and menu just closes.

                     

                    4)

                    Yes, Command property is $(TargetPath); Working directory is $(TargetDir)

                    And I checked, both point to correct places where executable resides...

                    Tried again - same symptoms: After pressing starting button there is  AMD APP Profiler is profiling... message in status string but app doesn't start and message box with text "Can't retrieve the active project executable path." appears.

                     

                    After pressing OK APP Profiler Session Parameters window appears where Active project directory setted correctly , but Application path reporting same error as message box. At this point I enter executable path and start profiling.

                  • Re: AMD APP Profiler 2.4 is now available
                    pasniak

                    re 4): Same bug here (v 2.4.1294).  I have gpu_dir variable setup in a vs2010 property sheet and pass it to the program in Command Arguments: $(gpu_dir).  The program hangs and I see the value passed to the program is $(gpu_dir) instead of  C:\my_gpu_data.

                  • Re: AMD APP Profiler 2.4 is now available
                    Raistmer

                    I have one question and one bug report.

                    Bugreport: if Profiler (being run as VS plugin) collects too much (~200M) data (but less than default API trace events ) it's unable to create final trace data file (atp) from intermediate ones. Sprofile.exe wrote required filles from temporary dir to session dire, but next step in data processing fails.

                     

                    Question: does Loveland APU (namely, C-60) supported in APP Profiler 2.4?

                    That is, does profiler return correct data on this device? I ask because I see just huge variations in both read/write buffer API call times and in actual device read/write operations (on the same data sizes). Variations are order of magnitude size. So, is it just incorrect data gathering in profiler or it's some natural specific of APU device?

                     

                    Also, why ReadBuffer API call _so_ costly if it synched one? I understand that It can trigger actual kernel execution before data transfer, but there is big delay before kernel starts to execute and also after data read finished on device. Why so big overhead? (same problem here: incorrect data ot APU specific?)

                     

                    Here the picture to illustrate what I'm talking about: [URL=http://imageshack.us/photo/my-images/32/appprofiler24c60.png/][IMG]http://img32.imageshack.us/img32/8463/appprofiler24c60.th.png[/IMG][/URL]

                      • Re: AMD APP Profiler 2.4 is now available
                        lbin

                        Hi Raistmer

                         

                        With regard your first question, it sounds like your machine ran out of memory. What intermediate files did you find?

                         

                        Loveland is supported. What trace mode did you use? Did you see huge variation in non-time-out mode?

                         

                        Thanks

                          • Re: AMD APP Profiler 2.4 is now available
                            Raistmer

                            Could you explain what is "timeout mode" ? Is it "write data to disk periodically?" if so then yes, I use this mode, disk writes at each 100ms. Should i disable it to get more accurate data?

                             

                            Time to time there are very long kernel execution times are reported for tha same workload...

                             

                            Regarding intermediate files I got - can't say for sure - some files in the same Session directory as resulting atp file should be. So tmp files were collected OK.

                            I use swap file on this PC so "out of memory" should not happen... In any case if program meet out of memory error better to specify this exactly and share with user instead of generic "can't do" error w/o explanation why "can't d". Hope this will be improved in next APP Profiler release. If this app will return reliable data it can be very valuable tool.