AnsweredAssumed Answered

Ryzen 1700x FPU weekness

Question asked by jmr on Oct 26, 2018
Latest reply on Oct 27, 2018 by elstaci

Hi,

I have designed a small program in assembly that compares different implementations of the scalar product using the FPU,  SSE and AVX instructions sets

 

Basically I fill two arrays of floats x[] and y[] and I compte sum of x[i] * y[i].

 

When I use the FPU (FLD, FMUL, FADD) instructions, my program executes for 16 seconds. On other architectures (Intel) it takes generally 10 seconds.

When I use the SSE registers working with vectors of 4 floats, it takes only 2 seconds (I use MOVDQA, MULPS, ADDPS)

So to be sure of what is happening, I decided to use the SSE registers computing one element at a time (MOVSS, MULSS, ADSS) and it executes

in 10 seconds.

 

So my analysis (maybe I am wrong) is that the FPU is relatively slow compared to SSE circuitry.

 

Does any body have any idea why ?

 

REgards,

JM

Outcomes