I'm benchmarking a Opteron cluster using High Performance Lapack .
When compiling with PGI Optimization option '-fastsse -tp=shanghai-64' , The final performace is a little bit LOWER than just use ' -fastsse'
When compiling with Open64 with '-O3 -march=barcilona' , the result is the same, a little bit lower than just use '-O3'
I suppose it should be faster when specifing the target processor. but result is weird.