Anything I can do to scale up better with multiple CPUs
I've run a benchmark on a single computer consisting of a Tyan motherboard+daugterboard containing 8 dual core opteron processors (16 cores total), running linux fedora core 4 or 5. each of the eight processors has it's own memory bank (NUMA architecture) with plenty of memory (<40% used). mpi is used to spread the computation over N cores. the application&case i'm trying to run scales up very linearly for everyone i've talked to, but for the system i'm using (above), the speedup is something like: 1 processor ~ 1X 4 processors ~ 2.3X (2.3 times fatser than 1 processor) 8 processors ~ 3.3X 16 processors~ system crashes, must be restarted
Everyone else claims to get almost 8X for 8 processors (even 16X for 16 processors) using the same software and the same input; just a different system/hardware. i think these other people are using seperate computers, each containing 1 or 2 single core processors, and mpi communication for them is via 100g ethernet or faster.
does anyone know why the system i've mentioned above would scale up so poorly? any ideas as to what the bottleneck might be? when i do a "top" it always shows all processes to be using 99+% CPU, so i thought that meant it was scaling well, but then when i look at the runtimes it clearly isn't.
In a single socket DC I see about an 80-90% bump unmeasured but I can see how fast it renders. The 16 core test sounds like a disappointment. The software itself may be suspect in this case so you may want to go with another.