I am running a parallelized electromagnetic Fortran-90 code on 32 cores (4 8-core Opteron processors on 1 motherboard).
If I use the Intel compiler, I run the code with OMP_NUM_THREADS=32 and all 32 cores are used.
Since Intel compiler cannot bind threads to cores for the AMD processors, I would like to use the AMD Open64 compiler (openf95). However, when I compile with this one, only the first 29 of 32 cores are used, and the other three sit idle.
The code I'm running simply has a loop that is being split 32 ways, so I can't imagine why 29 cores would be very busy, and 3 cores would sit idle.
Can someone help by explaining or pointing me to any literature on how the threads are allocated, i.e. if there are any initial "manager" threads that have to be skipped.. I recall having to use a "dplace" command to run jobs on SGI Altix machines to skip master threads in order to properly distribute and bind threads to cores.
Thanks for any help!