2 Replies Latest reply on Nov 26, 2012 11:35 AM by wesley.emeneker

    Using ACML with Matlab

    wesley.emeneker

      We want to use ACML 5.[12] with matlab since ACML is much faster on Interlagos and Abu Dhabi than MKL (which matlab uses by default).

      I was able to get matlab R2012b and R2011b to use version 5.2.0 by setting some environment variables before starting matlab:

       

      export LAPACK_VERBOSITY=1

      export BLAS_VERBOSITY=1

      export BLAS_VERSION="/usr/local/packages/acml/5.2.0/ifort/ifort64_fma4_mp/lib/libacml_mp.so,acmlcompat.so"

      export LAPACK_VERSION=""/usr/local/packages/acml/5.2.0/ifort/ifort64_fma4_mp/lib/libacml_mp.so,acmlcompat.so"

       

      Matlab happily used the acml_mp.so library provided, but performance was incredibly slow (with a 10,000x10,000 matrix doing x=A\B).

      The serial version of ACML was faster with matlab than the threaded version, but still much slower than expected.

      I tried removing  "acmlcompat.so" from the load list, but that didn't make a difference.

       

      Setting the verbosity to 1 told me that the correct libraries were being loaded. Setting the verbosity to 0 made no difference in performance.

       

      What am I doing wrong?

       

      Thanks,

      Wesley

        • Re: Using ACML with Matlab
          chipf

          There could be a couple of things going on.  First, you may need to set OMP_NUM_THREADS to the number of cores available on the system.  If "top" shows all threads working, then this is working as expected.  But even if it is running with just one thread, then it shouldn't run slower than the single threaded version.

           

           

          If the kernel is not the latest, you might try turning off address space randomization, as shown in this gcc wiki article:

          http://gcc.gnu.org/wiki/Randomization

           

          With 5.2.0, you can also try using the "non-fma4" library.  The library in ifort4_mp/lib will use FMA4 GEMM kernels if it detects the FMA4 instruction set.  This might reduce the need for a Bulldozer specific build.  This by itself won't solve the problem, but might make configuration a bit easier once the performance issue is resolved.

            • Re: Using ACML with Matlab
              wesley.emeneker

              I tried multiple thread counts and affinity settings (with numactl). 1,2,8 cores and 64 cores. That didn't make a difference.

              ASR was turned off (set to zero) with sysctl.conf. (We were running a RHEL6.2  2.6.32-23 kernel that works well with ACML5.2/fma4 with HPL)

              I did not try the non-fma4 library. It seems like you think the non-fma4 library won't cause any change, but I'll try it if you think it is worth it.

               

              Thanks for the response.

               

              Wesley