I have assembled a new Phenom II machine with a Phenom II X4 965 BE processor, 6 GB RAM, ASUS motherboard, onboard Video sharing the main memory.
I am running a multithreaded program with different number of threads. I have set thread priority to maximum, cpu affinity, schedulling policy to FIFO, but the program takes double the time that it takes to run on a Turion X2 RM-72.
The Phenom version of the program is compiled in 64 bits running on Linux OpenSuse 11.2 64 bits, while the Turion version is compiled in 32 bits running on Linux OpenSuse 11.1 32 bits with PAE extension.
For the Turion version I have found that a 6 threads configuration gave me the best result, this would make me assume I should use 3 threads per core, however on the Phenom using from 2 to 16 threads I got the same time result which is double the time the Turion version took with 6 threads.
Any thoughts ?
just to add some more info.
I added a GeForce GT240 video card and this was NOT the problem. I tried to run a parallel program for matrix multiplication in C on both machines and the Phenom was incredibly faster.
The problem, however comes when I try to run my project, an environment that runs non-compiled programs in parallel.
when comparing two Athlon Turion X2 processors with different clocks, the greater clock got better performance, but the Phenom processor is 2 times slower than the Turion X2 for this tipe of application.
any thoughts ? any help ? Is there any specific architecture characteristics on the Phenom processors that could make it behave like this ? if so, how can I overcome this problem?
Well, no, I didn't try to run a 32bit version of the app.
let me try to explain the app, it is a parallel implementation of a interpreted language's virtual machine, so that I use pthreads over multicore to allow the programmer to explore real parallelism.
However the code is running better on a Turion based binary. I have tried the 64 bits version both on a Turion and on a Phenom machines, and the Turion was better, also, the 32bits Turion version was better than the 64 bits Phenom.
Well... Didn't helped much... Could you post the code of the benchmark?
I know you probably already checked this, but... Have you looked at the actual clock the Phenom II is running?
By the app described and performance numbers I believe it has something to do with memory access patterns.