1 of 1 people found this helpful
Hi erotavlas, I got these answers from one of our engineers working on the Open64 compiler just before he was getting on a plane.
Question #2. There are many compiler options you can try to
improve performance on your Opteron 6168, but I'd suggest starting with the
quick reference sheet: http://developer.amd.com/assets/CompilerOptQuickRef-61004100.pdf and
setting the appropriate architecture (e.g. -march=barcelona) and then working
with optimization levels (-O1, -O2, -O3, -Ofast) before trying some of the more
complex options available.
Question #1. As far as differences between Open64/GCC goes - gcc
is your platform compiler and has gotten better with performance in later
releases. Open64 has a number of advanced optimizations and specifically
targeted at higher performance, though also works to maintain compatibility
with gcc. I would suggest trying both with your application. Open64
general optimizations that should apply both to AMD and non-AMD platforms as
well as architecture switches for some non-AMD platforms. However, almost
all AMD work has been in improving Open64 with latest AMD processors.
Question #3. The ACML library has a specific set of library
entry points. You might already be using some of these routine names in
your codes - in which case it is a case of linking with ACML library.
However, if not, you'll be calling specific functions.
Hope that answers your questions. If so, please push the button indicating that it has been answered. Thanks!
as you suggested I have tried this guide http://developer.amd.com/Assets/CompilerOptQuickRef-61004100.pdf. I'm compiling with GCC 4.6.2 under Linux. The parameters from the guide are the following:
-O3 -march=barcelona -fschedule-insns -fschedule-insns2 -fsched-pressure -funroll-all-loops -fprefetch-loop-arrays --param prefetch-latency=300 -minline-all-stringops -fno-tree-pre -ftree-vectorize.
It's known that the parameter -funroll-all-loops leads to less performance as specified here http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#Optimize-Options. The parameter -fprefetch-loop-arrays can or can't lead to better performance depending on the code. So my question is about the reliability of the parameters.
I have found that these parameters are the best for my hardware and my code:
-O3 -m64 -s -fsplit-stack -fwhole-program -flto -march=barcelona -fschedule-insns -fschedule-insns2 -fsched-pressure -ftree-loop-distribution -fpeel-loops -mmovbe -mcrc32 -mcx16 -msahf -mvzeroupper -msahf -mtls-dialect=gnu2 -minline-stringops-dynamically -mno-align-stringops -maccumulate-outgoing-args -funroll-loops -fprefetch-loop-arrays --param prefetch-latency=300
The parameters in the quick reference guide are a suggested set of options to try. In some cases, -funroll-all-loops causes improvements and in some degradations and hence the documentation on the gcc page that says “usually” makes programs run more slowly. It is always best to try and test, which it lookes like you did. Glad to see that you have found the options that work best for your program.