AnsweredAssumed Answered

Poor performance of ACML 5.3.1 on Intel Core i7

Question asked by marcink on Nov 25, 2013
Latest reply on Dec 17, 2013 by noam1977

Hello all,

 

I have a big problem with performance of BLAS in the newest ACML library on Intel processors. I have a Intel Core i7-2620M CPU. The theoretical peak FP performance with AVX on a single CPU core is 27GFLOPs (the cpu has a 3.4GHz clock in Turbo mode). I tested the DGEMM function (dense matrix-matrix multiplication). With Intel MKL library I manage to achieve 43 GFLOPs running on 2 cores, which is a decent 80% of peak. With ACML I only manage to get 9 GFLOPs on 1 and 17 GFLOPs on 2 cores. I run the tests in MATLAB. Here are detailed results (3 runs for every system size tested, 4 different matrix sizes):

 

% Intel(R) Math Kernel Library Version 10.3.11 Product Build 20120606 for Intel(R) 64 architecture applications
% dim 1000, dgemm [gflops]:  31.4 37.3 37.0
% dim 2000, dgemm [gflops]:  41.6 40.9 41.8
% dim 3000, dgemm [gflops]:  42.8 42.9 43.0
% dim 4000, dgemm [gflops]:  43.3 42.9 42.9

 

% AMD Core Math Library(TM) Version 5.3.1.182
% dim 1000, dgemm [gflops]:  9.9 16.3 15.0
% dim 2000, dgemm [gflops]:  16.3 15.8 16.8
% dim 3000, dgemm [gflops]:  17.1 17.0 16.9
% dim 4000, dgemm [gflops]:  15.9 16.1 15.4

 

Does any of you know why the poor performance? I have seen this (old) thread posted earlier: http://devgurus.amd.com/thread/104976. Some people from AMD claimed that efforts are made to make sure ACML runs efficiently on other platforms. Is this still the case?

 

Thanks!

Outcomes