    About Optimization of Opteron


      I'm benchmarking a Opteron cluster using High Performance Lapack .

      When compiling with PGI Optimization option '-fastsse -tp=shanghai-64' , The final performace is a little bit LOWER than just use ' -fastsse'

      When compiling with Open64 with '-O3 -march=barcilona' , the result is the same, a little bit lower than just use '-O3'

      I suppose it should be faster when specifing the target processor. but result is weird.

          Both the Open64 and PGI compiler default to using the architecture of the machine being used for the compilation when no -march (Open64) or -tp (PGI) option is given.  This is different than gcc which defaults to a generic code generation when no architecture switch is given.

          What does /proc/cpuinfo show as your processor?

          For PGI, it could be the case that you are running on a system that defaults to -tp,istanbul instead and hence better performance there.

          For Open64, what does "-v" show for a compilation of a simple file with your two choices?  That would show where they are different.  Also, note that the option is spelled -march=barcelona (not barcilona) but I assume that is just a typo in your posting above.