1 Reply Latest reply on Oct 27, 2018 6:05 PM by elstaci

    Ryzen 1700x FPU weekness

    jmr

      Hi,

      I have designed a small program in assembly that compares different implementations of the scalar product using the FPU,  SSE and AVX instructions sets

       

      Basically I fill two arrays of floats x[] and y[] and I compte sum of x[i] * y[i].

       

      When I use the FPU (FLD, FMUL, FADD) instructions, my program executes for 16 seconds. On other architectures (Intel) it takes generally 10 seconds.

      When I use the SSE registers working with vectors of 4 floats, it takes only 2 seconds (I use MOVDQA, MULPS, ADDPS)

      So to be sure of what is happening, I decided to use the SSE registers computing one element at a time (MOVSS, MULSS, ADSS) and it executes

      in 10 seconds.

       

      So my analysis (maybe I am wrong) is that the FPU is relatively slow compared to SSE circuitry.

       

      Does any body have any idea why ?

       

      REgards,

      JM

        • Re: Ryzen 1700x FPU weekness
          elstaci

          Since you are a Developer/Programmer you may want to try asking at AMD Server Forum : AMD Server Gurus even though your program doesn't concern a Server, but someone may know about the Ryzen code sets to answer you question.

           

          You also may try at AMD OpenCL & Vulkan Forum where there are many programmers active there. Maybe someone might know. Here is the AMD OpenCL & Vulkan Forum: OpenCL . Possible that AMD Moderator dipak might be able to direct you to the correct Forum or source to answer you question.

           

          Another website that may be useful is GITHUB: GitHub · GitHub

           

          If not, you can always open an AMD EMAIL SUPPORT Ticket and see what they say: Online Service Request | AMD