Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

Journeyman III

double-precsion SSE2 performance

I compiled and ran the mandel app from the article "Performance Optimization of 64-bit Windows Applications for AMD 64..." by Mike Wall on this site.

When using the code for SSE single-precision the performance is quite good. But when trying the code for SSE2 double-precision, the performance is really bad.

For examples, the SSE sp on PIII-800 was 1.5 gflop, on Athlon XP 2.2 was 3.8, and Athlon64 3.8 was 5.5

Then the SSE2 dp on the Athlon64 3.8 was 1.8 (which is the only processor which supports SSE2).

Am I missing a compiler switch? or the vc++ optimizations for SSE2 dp is so bad.

ps. I compiled and ran the code using Vista 32-bit.
Tags (2)
1 Reply
Journeyman III

double-precsion SSE2 performance

I heard, that the quality of MSVC's SSE2 code is not so good. You can try to watch the asm-listing of the generated code if you wish to know why. BTW, there are several compilers are exist along with MSVC: Intel, PGI, PathScale, GCC and so on. Did you try to use them?