3 Replies Latest reply on Dec 4, 2009 7:39 PM by chipf

    Parallelized BLAS routines with OpenMP

    eS-Tea
      Which BLAS routines benefit from OpenMP?

      Hello everyone!

      I am currently checking whether the OpenMP version of ACML is suitable for our algorithms which make extensive use of BLAS routines.
      The release notes of ACML version 2.5.3 indicate that "SMP support has been added to various key level 2 and level 3 BLAS routines".
      I also found a list of LAPACK routines parallelized with OpenMP in the release notes of ACML Version 3.6.0 but I did not find a corresponding
      list for the BLAS routines.

      Can anyone provide me with such a list of BLAS routines which benefit from OpenMP?

      Regards,
      Sven

        • Parallelized BLAS routines with OpenMP
          chipf

          Most of the level 3 BLAS routines have OpenMP capability.  This includes *gemm, *symm, *trmm, and *trsm.   OpenMP has been implemented for all precisions.

          We've also added OpenMP to some of the level 2 routines, such as *gemv, *geru, and *gerc.

            • Parallelized BLAS routines with OpenMP
              eS-Tea

               

              Originally posted by: chipf Most of the level 3 BLAS routines have OpenMP capability.  This includes *gemm, *symm, *trmm, and *trsm.   OpenMP has been implemented for all precisions.

               

              We've also added OpenMP to some of the level 2 routines, such as *gemv, *geru, and *gerc.

               

              Hi,

              thanks for the information. We are using lots of *gemv so that is exactly what I was hoping for.
              Unfortunately the algorithm did not parallelize at all during my tests yesterday.

              Coincidentally, I have just read about the problem size and the parallel section in your second blog entry from SuperComputing 2009.
              We are currently working on extremely small problem sizes in terms of High Performance Computing, so I guess that explains why the
              algorithm does not parallelize to more than one thread.

              Regarding the BLAS routine *gemv, which problem size is large enough for ACML to enter a parallel section?

              Regards,
              Sven