double-precsion SSE2 performance

Discussion created by danielp on Nov 27, 2007
Latest reply on Dec 30, 2007 by alexey_khosh@rambler.ru
I compiled and ran the mandel app from the article "Performance Optimization of 64-bit Windows Applications for AMD 64..." by Mike Wall on this site.

When using the code for SSE single-precision the performance is quite good. But when trying the code for SSE2 double-precision, the performance is really bad.

For examples, the SSE sp on PIII-800 was 1.5 gflop, on Athlon XP 2.2 was 3.8, and Athlon64 3.8 was 5.5

Then the SSE2 dp on the Athlon64 3.8 was 1.8 (which is the only processor which supports SSE2).

Am I missing a compiler switch? or the vc++ optimizations for SSE2 dp is so bad.

ps. I compiled and ran the code using Vista 32-bit.