3 Replies Latest reply on Feb 24, 2012 4:33 PM by stroia

    Optimize performance of AMD opteron


      Hi all,


      I'm new of this forum. I have a lot of question for you . I have a full AMD system with 4 CPU AMD Opteron 6168 and I would like to achieve the best performance from it. Let's me start:

      1) I'm using GCC latest version and I would like to know the main differences with open64. I know only GCC and I have never used open64. Is this compatible also with the processors of the other vendors? Which is better between them?


      2) What are the best option the get the best performances with my system and GCC compiler?


      3) The ACML can be used with GCC. I would like to know if the library require a direct call of the functions from the library or if it used from the compiler to optimize automatically the performance.


      Thank you




      Message was edited by: Sharon Troia (reduced the length of the title, it was causing us not to be able to respond)

        • Re: Optimize performance of AMD opteron

          Hi erotavlas, I got these answers from one of our engineers working on the Open64 compiler just before he was getting on a plane.

          Question #2. There are many compiler options you can try to
          improve performance on your Opteron 6168, but I'd suggest starting with the
          quick reference sheet: http://developer.amd.com/assets/CompilerOptQuickRef-61004100.pdf and
          setting the appropriate architecture (e.g. -march=barcelona) and then working
          with optimization levels (-O1, -O2, -O3, -Ofast) before trying some of the more
          complex options available.


          Question #1. As far as differences between Open64/GCC goes - gcc
          is your platform compiler and has gotten better with performance in later
          releases.  Open64 has a number of advanced optimizations and specifically
          targeted at higher performance, though also works to maintain compatibility
          with gcc.  I would suggest trying both with your application.  Open64
          general optimizations that should apply both to AMD and non-AMD platforms as
          well as architecture switches for some non-AMD platforms.  However, almost
          all AMD work has been in improving Open64 with latest AMD processors.


          Question #3. The ACML library has a specific set of library
          entry points.  You might already be using some of these routine names in
          your codes - in which case it is a case of linking with ACML library.
          However, if not, you'll be calling specific functions.

          Hope that answers your questions.  If so, please push the button indicating that it has been answered.  Thanks!

          1 of 1 people found this helpful
            • Re: Optimize performance of AMD opteron

              Hi stroia,




              as you suggested I have tried this guide http://developer.amd.com/Assets/CompilerOptQuickRef-61004100.pdf. I'm compiling with GCC 4.6.2 under Linux. The parameters from the guide are the following:


              -O3 -march=barcelona -fschedule-insns -fschedule-insns2 -fsched-pressure -funroll-all-loops -fprefetch-loop-arrays --param prefetch-latency=300 -minline-all-stringops -fno-tree-pre -ftree-vectorize.


              It's known that the parameter -funroll-all-loops leads to less performance as specified here http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options. The parameter -fprefetch-loop-arrays can or can't lead to better performance depending on the code. So my question is about the reliability of the parameters.


              I have found that these parameters are the best for my hardware and my code:


              -O3 -m64 -s -fsplit-stack -fwhole-program -flto -march=barcelona -fschedule-insns -fschedule-insns2 -fsched-pressure -ftree-loop-distribution -fpeel-loops -mmovbe -mcrc32 -mcx16 -msahf -mvzeroupper -msahf -mtls-dialect=gnu2 -minline-stringops-dynamically -mno-align-stringops -maccumulate-outgoing-args -funroll-loops -fprefetch-loop-arrays --param prefetch-latency=300




              Thank you

                • Re: Optimize performance of AMD opteron

                  The parameters in the quick reference guide are a suggested set of options to try.  In some cases, -funroll-all-loops causes improvements and in some degradations and hence the documentation on the gcc page that says “usually” makes programs run more slowly.  It is always best to try and test, which it lookes like you did.  Glad to see that you have found the options that work best for your program.