I am wondering if anyone could provide any insight / suggestions of methods that I could use to further investigate the issue I am experiencing:
I develop a scientific application that makes heavy use of floating point operations. The application is single threaded, but the problem it works on is embarrassingly parallel thus multiple instances of the same executable with different inputs are run.
Command line application is developed in unmanaged visual studio 2010 c++ with full optimization flags. In the past the code was run on Win 7 x64 on i7 870 (Lynnfield) processor.
Win 2008 R2 HPC Server was installed on the Quad Opteron 6276 hardware. The same executable was run and it had a run time of twice as long as the i7 870. - I was quite disappointed although I had read reviews about the single thread performance of the Opteron 6200's. I tried numerous tuning flags, AVX, etc.. to really no avail.
Installed Ubuntu 12 server on the Quad Opteron 6276. Compiled the same c++ code with g++ /O3. Run time compared to the Windows HPC OS / executable was halved, back to the Win 7 i7 performance. So is the issue the VS 2010 compiler, Windows Server 2008, or something else?
I read about recent C++ improvements in Visual Studio 2012 RC so I downloaded the trial and compiled the application, but the performance on Win HPC was the same poor performance as with VS 2010.
To test the gcc vs VS compiler, I compiled the code using the min32-w64-g++ compiler on Linux. The min32-w64 generated windows binary performs equally well to the VS one on the Win 7 i7, and equally poor on the Win Server Opteron 6276.
I installed a Ubuntu Server VM using VirtualBox under Win Server. Performance of the code in the Ubuntu VM was about 50% worse than when running on a native Linux OS, but 50% better than when running the Windows binary on a native Win OS. Understand that - 50% BETTER performance of the Linux binary on the Opteron 6276 when run in a Ubuntu VM on Win Server than native windows code run on Win Server.
I installed a Win 7 x64 VM using VirtualBox under Ubuntu Server. Performance of the code in the Win 7 VM is the same poor performance as when running on the native Win Server OS.
I installed the two Windows Hotfixes that I could find possibly related to this issue (2645594 and 2646060). Power options are set to Maximum Performance in the BIOS, HPC Mode is enabled in the BIOS. Windows Maximum Performance power options are selected. CPU-Z sees a CPU speed of 2600 MHz which cooresponds to Pb1 Pstate.
I am presently downloading the RC of Windows Server 2012 to see what the native performance is under that.
For a variety of reasons we would like to continue to develop and run our application under Win OS. Does anyone have any other thoughts of settings I could change to improve the native Win performance?
Since you haven't received any replies yet here, you could try:
1. posting your question in the AMD Opteron forum: http://forums.amd.com/forum/categories.cfm?catid=29&entercat=y
2. contacting AMD support either by email (http://emailcustomercare.amd.com/) or by phone (http://support.amd.com/us/contacts/Pages/global-technical-support.aspx)
If you try these options and you still don't get your questions answered, will you please let me know?