We are using a program which uses lot's of zgemm-calls (up to 80% of the program-running time).
I'm testing the program on bulldozer-cpu's with acml 5.1, one time with fma4, another time without fma4.
Both times compiled with Intel Fortran Compiler 12.
I cannot find any major differences in the runtimes. Is this correct? Or should FMA4 speed-up zgemm?
With best regards