1 Reply Latest reply on Feb 17, 2012 9:31 AM by chipf

    ACML 5.1. same speed for zgemm with and without FMA4?

    andersartig

      Hello

       

      We are using a program which uses lot's of zgemm-calls (up to 80% of the program-running time).

       

      I'm testing the program on bulldozer-cpu's with acml 5.1, one time with fma4, another time without fma4.

      Both times compiled with Intel Fortran Compiler 12.

       

      I cannot find any major differences in the runtimes. Is this correct? Or should FMA4 speed-up zgemm?

       

      With best regards

      Axel

        • Re: ACML 5.1. same speed for zgemm with and without FMA4?
          chipf

          You should see a speedup in this case.  The fma4 zgemm is not as well tuned as DGEMM, but it should provide some benefit compared to running the SSE library.  If you can run the application with gdb and stop it in the ZGEMM kernel, you should see AVX/FMA4 instructions.  If not, the application is picking up the wrong library somehow.  If you do see fma4 instructions (vfmaddpd, etc), then this proves we need to spend more time on tuning our ZGEMM kernels.