Anything I can do to scale up better with multiple CPUs
I've run a benchmark on a single computer consisting of a Tyan motherboard+daugterboard containing 8 dual core opteron processors (16 cores total), running linux fedora core 4 or 5. each of the eight processors has it's own memory bank (NUMA architecture) with plenty of memory (<40% used). mpi is used to spread the computation over N cores. the application&case i'm trying to run scales up very linearly for everyone i've talked to, but for the system i'm using (above), the speedup is something like:
1 processor ~ 1X
4 processors ~ 2.3X (2.3 times fatser than 1 processor)
8 processors ~ 3.3X
16 processors~ system crashes, must be restarted
Everyone else claims to get almost 8X for 8 processors (even 16X for 16 processors) using the same software and the same input; just a different system/hardware. i think these other people are using seperate computers, each containing 1 or 2 single core processors, and mpi communication for them is via 100g ethernet or faster.
does anyone know why the system i've mentioned above would scale up so poorly? any ideas as to what the bottleneck might be? when i do a "top" it always shows all processes to be using 99+% CPU, so i thought that meant it was scaling well, but then when i look at the runtimes it clearly isn't.
Thanks