I have assembled a new Phenom II machine with a Phenom II X4 965 BE processor, 6 GB RAM, ASUS motherboard, onboard Video sharing the main memory.
I am running a multithreaded program with different number of threads. I have set thread priority to maximum, cpu affinity, schedulling policy to FIFO, but the program takes double the time that it takes to run on a Turion X2 RM-72.
The Phenom version of the program is compiled in 64 bits running on Linux OpenSuse 11.2 64 bits, while the Turion version is compiled in 32 bits running on Linux OpenSuse 11.1 32 bits with PAE extension.
For the Turion version I have found that a 6 threads configuration gave me the best result, this would make me assume I should use 3 threads per core, however on the Phenom using from 2 to 16 threads I got the same time result which is double the time the Turion version took with 6 threads.
Any thoughts ?