I am experiencing a huge loss of performance (40 times in some cases) when I run my calculations (under openMPI/SGE 6.2u5p2) on an AMD cluster (Quad-Core AMD Opteron(tm) Processor 8376 HE, 16 core per node) and the scheduler (which is set up in the fill-up mode) does not use the total number of cores (even being available) in some nodes, other jobs starting then on the same nodes
I believe the procedure to solve this problem has to do with "exclusive scheduling", but this does not seem to exist in my current openMPI installation (1.4.4). On the other hand, I have seen things like "core-binding" and "hardware locality", but I am not sure they are appropriate for my case.
Lets say a typical calculation uses 48 cores, so ideally I would like to see three full nodes, 16 core each running.
I understand hardware locality or cpu-binding are propable appropriate for jobs that require several processes, which is not my case, since I only have one process
Is there any procedure I could apply in order to control the strict fill-up of all 16 core per node and therefore avoid that other jobs run on available cores from my running nodes?
thanks a lot