ACML 4.1.0 vs MKL 10.q opt2356 E5530 DGEMV()

I am seeing some interesting behavior of a new Itel chips, vs barcelonas, and MKL ACML for the DGEMV kernel,


Problem size is 1010, openMP 8 cores


CPU         MFlop/s       BLAS Lib

opt2356   838             ACML 4.1

E5530    4435             ACML 4.1

opt2356   858             MKL

E5530    3743             MKL


Strange thing is the DGEMM() kernel and DDOT() are about the same speeds on both systems.  With both BLAS libraries.  ACML has issues with dgemm() on the Intel and MKL has issues with dgemm() on the amd, no surpise.


I expected the tripple channgel memory bandwdith of the Intel to show an 50% improvment in the ddot() and similar kernels, but am not.


I do like the imporoved DGEMV() performance of the new intel platform, and I wish I would have tested it on a Shanghi, I also like how ACML is getting perofmrnace bumps in DGEMV() the same as MKL. Portability is nice must say.


Any comments would be liked.