I am looking for help on how to set thread affinities for AMD processors when using OpenMP with the Intel Compiler. I have a Fortran numerical simulation code, and need to run it in parallel on many cores. Because it's speed is memory-bandwidth limited, I cannot afford for the threads to keep jumping around from core to core and especially from CPU to CPU.
I was told, including by AMD staff, that the Intel Compiler works best of all the compilers for optimizing high-performance (scientific computing) code. (Tried PGI, but can't get it to run efficiently with OpenMP).
When I compile for Intel Nehalem, I can use set KMP_AFFINITY=verbose,granularity=fine,scatter to fix threads to cores. However, this does not work for AMD CPUs (the KMP_AFFINITY command craps out).
Thanks in advance for help on how to solve this problem. I have been trying to benchmark Magny-Cours processors as candidates for a memory-bandwidth intensive scientific computing application, but so far I simply cannot get any compiler to properly parallelize OpenMP with them.
I don't know answer to your question on KMP_AFFINITY, though the Intel Fortran manual also lists -par-affinity compiler option as related functionality that might be different way to accomplish this task.
One additional thing I'll also suggest giving a try is using the Open64 compiler. This compiler supports O64_OMP_SET_AFFINITY and O64_OMP_AFFINITY_MAP environment variables. The latter lets you provide a specific binding of threads to cpu cores. Both are described at: http://developer.amd.com/cpu/open64/onlinehelp/pages/x86_open64_help.htm#Environment-Variables-1
Open64 compiler performs better than Intel Compiler in SPEC OMP2001 benchmark suite!!! As Mike suggested in earlier post, You can use Open64 compiler which is high performance, production quality code generation tool designed for high performance parallel computing workloads. And its Open source. I did this experiment using all the available cores on AMD Mangy-cours machine.
You can download the compiler from : developer.amd.com
In my experiments it seemed Open64 had better overall geomean for SPEC OMP2001. Can you provide details of the suite you are experimenting with? Can you let me know the number of processors (or sockets) on the system? I had used 2 Processor ie it had 24 cores. So I would suggest you try Open64 on Magny Cour and please let me know what your experience is.
As suggested above please use O64_OMP_SET_AFFINITY and O64_OMP_AFFINITY_MAP environment variables to bind the threads and memory to the appropriate cores. This will make sure you have the best performance.
For Intel compiler, as far as I know there is no support for binding threads to cores for non-Intel processors. So using Intel compiler, I would suggest you run as many threads as the total number of cores on your system. The OS will handle thread and memory binding most appropriately and you mostly will get best results. You can then compare this with open64 performance (with open64 you have the opportunity to use O64_OMP_SET_AFFINITY and O64_OMP_AFFINITY_MAP to get best results). Appreciate if you can share the outcome of your experiment.