    Multithreaded performance bellow expectations

      I have assembled a new Phenom II machine with a Phenom II X4 965 BE processor, 6 GB RAM, ASUS motherboard, onboard Video sharing the main memory.

      I am running a multithreaded program with different number of threads. I have set thread priority to maximum, cpu affinity, schedulling policy to FIFO, but the program takes double the time that it takes to run on a Turion X2 RM-72.

      The Phenom version of the program is compiled in 64 bits running on Linux OpenSuse 11.2 64 bits, while the Turion version is compiled in 32 bits running on Linux OpenSuse 11.1 32 bits with PAE extension.

      For the Turion version I have found that a 6 threads configuration gave me the best result, this would make me assume I should use 3 threads per core, however on the Phenom using from 2 to 16 threads I got the same time result which is double the time the Turion version took with 6 threads.

      Any thoughts ?

          just to add some more info.

          I added a GeForce GT240 video card and this was NOT the problem. I tried to run a parallel program for matrix multiplication in C on both machines and the Phenom was incredibly faster.

          The problem, however comes when I try to run my project, an environment that runs non-compiled programs in parallel.

          when comparing two Athlon Turion X2 processors with different clocks, the greater clock got better performance, but the Phenom processor is 2 times slower than the Turion X2 for this tipe of application.

          any thoughts ? any help ? Is there any specific architecture characteristics on the Phenom processors that could make it behave like this ? if so, how can I overcome this problem?