6 Replies Latest reply on Dec 17, 2013 10:15 AM by noam1977

    Poor performance of ACML 5.3.1 on Intel Core i7

    marcink

      Hello all,

       

      I have a big problem with performance of BLAS in the newest ACML library on Intel processors. I have a Intel Core i7-2620M CPU. The theoretical peak FP performance with AVX on a single CPU core is 27GFLOPs (the cpu has a 3.4GHz clock in Turbo mode). I tested the DGEMM function (dense matrix-matrix multiplication). With Intel MKL library I manage to achieve 43 GFLOPs running on 2 cores, which is a decent 80% of peak. With ACML I only manage to get 9 GFLOPs on 1 and 17 GFLOPs on 2 cores. I run the tests in MATLAB. Here are detailed results (3 runs for every system size tested, 4 different matrix sizes):

       

      % Intel(R) Math Kernel Library Version 10.3.11 Product Build 20120606 for Intel(R) 64 architecture applications
      % dim 1000, dgemm [gflops]:  31.4 37.3 37.0
      % dim 2000, dgemm [gflops]:  41.6 40.9 41.8
      % dim 3000, dgemm [gflops]:  42.8 42.9 43.0
      % dim 4000, dgemm [gflops]:  43.3 42.9 42.9

       

      % AMD Core Math Library(TM) Version 5.3.1.182
      % dim 1000, dgemm [gflops]:  9.9 16.3 15.0
      % dim 2000, dgemm [gflops]:  16.3 15.8 16.8
      % dim 3000, dgemm [gflops]:  17.1 17.0 16.9
      % dim 4000, dgemm [gflops]:  15.9 16.1 15.4

       

      Does any of you know why the poor performance? I have seen this (old) thread posted earlier: http://devgurus.amd.com/thread/104976. Some people from AMD claimed that efforts are made to make sure ACML runs efficiently on other platforms. Is this still the case?

       

      Thanks!

        • Re: Poor performance of ACML 5.3.1 on Intel Core i7
          kknox

          Hi Marcink~

           

          I do not expect such poor performance; this will have to be debugged on our side.

           

          How are you swapping the BLAS library underneath MatLab?  Is there a script associated with your timings?

           

          If you have the time and are interested, could you try your tests with ACML 4.4.0?

          http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/acml-archive-downloads/

           

          Kent

            • Re: Poor performance of ACML 5.3.1 on Intel Core i7
              marcink

              Hi, kknox,

               

              Thanks for your answer. I have tried with acml4.4.0, and the results are virtually the same as with 5.3. I have verified that ACML runs on two cpu cores in the parallel case using top. I change the blas by setting BLAS_VERSION environment variable, e.g.

               

              LD_LIBRARY_PATH=/path/to/acml4.4.0/gfortran64_mp_int64/lib/ BLAS_VERSION=/path/to/acml4.4.0/gfortran64_mp_int64/lib/libacml_mp.so matlab

               

              The script is really simple:

               

              S = [1000:1000:4000];
              for s=1:numel(S)
                  fprintf(['dim ' num2str(S(s))]);
                  A = rand(S(s));
                  B = A';
                 
                  fprintf(', dgemm [gflops]: ');
                 
                  t = tic;
                  C = A*B;
                  t=toc(t);
                  fprintf([' ' num2str(S(s)^3*2/t/1e9, '%.1f')]);
                 
                  t = tic;
                  C = A*B;
                  t=toc(t);
                  fprintf([' ' num2str(S(s)^3*2/t/1e9, '%.1f')]);

               

                  t = tic;
                  C = A*B;
                  t=toc(t);
                  fprintf([' ' num2str(S(s)^3*2/t/1e9, '%.1f') '\n']);
              end

               

              To check my results I have also substituted matlabs blas with latest OpenBLAS, and the results are (exactly) like those I obtain with MKL, so it is likely not a problem with MATLABs blas interface.

               

              Marcin