0 Replies Latest reply on Mar 19, 2014 3:26 PM by lewisrd

    dsyev slowdown after dgels in ACML 5.3.1

    lewisrd

      I have observed surprising behavior when calling dsyev() after dgels() when using ACML 5.3.1.  A short example illustrating the issue follows this post.  In that example, I first call dsyev(), which takes about 0.05 seconds, then I call dgels(), and finally I call dsyev() again.  This second call to dsyev() takes about 45 seconds.  The arguments to both calls to dsyev() are the same, but the second call is about 900 times slower than the first call!  This performance disparity does not appear if the intervening call to dgels() is removed; nor does it appear if I link against ACML 4.4.0 instead of 5.3.1.

       

      Here are some details of my computing environment:

      Compiler: gfortran 4.8.2

      ACML: 5.3.1/gfortran64_fma4_mp/lib/libacml_mp.a

      CPUs: Dual Opteron 4334, 12 cores total

      OS: Linux, x86_64, kernel version 2.6.32

      Compile command: gfortran -m64 -fopenmp -o dgelsfun dgelsfun.f90 acml/5.3.1/gfortran64_fma4_mp/lib/libacml_mp.a

       

      I would be happy to provide additional details of my computing environment if needed.

       

      I would be very grateful if someone can explain what I am seeing -- perhaps by pointing out a bug in the code that follows! -- and more importantly, help me to resolve the issue.

       

      Regards,

      --Ryan

       

      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      !
      ! NAME: dgelsfun.f90
      !
      ! DESC: A short example that exhibits unexpected behavior of dgels()
      !   and dsyev() in ACML 5.3.1.
      !
      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      PROGRAM dgelsfun
        IMPLICIT NONE

       

        INTEGER*4, PARAMETER :: m=965, n=42, nn=450, nr=440, lewrk=10*nn, &
                                llswrk=10*m

       

        REAL*8    :: Q(nn,nn), ewrk(lewrk), eig(nn), A(m,nr), B(m,n), &
                     lswrk(llswrk)
        INTEGER*4 :: info

       

        !
        ! Build an nn-by-nn real symmetric matrix Q and find its eigenstuff
        !
        ! This call to dsyev() takes 0.051 seconds on a typical 12 core
        ! machine
        !
        CALL bldfrmtx(nn, nn, Q)
        CALL dsyev('V', 'U', nn, Q, nn, eig, ewrk, lewrk, info)
        IF (info /= 0) STOP "Fail"

       

        !
        ! Solve an m-by-nr least squares problem AX=B with n right hand sides
        !
        CALL bldfrmtx(m, nr, A)       ! A has full rank
        B(:,:) = 0.d0
        CALL dgels('N', m, nr, n, A, m, B, m, lswrk, llswrk, info)
        IF (info /= 0) STOP "Fail"

       

        !
        ! Build an nn-by-nn real symmetric matrix Q and find its eigenstuff
        !
        ! This call to dsyev() takes 45.3 seconds on a typical 12 core
        ! machine
        !
        ! REMARK: This call to dsyev() is identical to the previous one, but
        !   it takes ~900 times longer to execute than that one when linked
        !   against ACML 5.3.1. If the above call to dgels() is commented
        !   out, then the execution times of both calls to dsyev() are
        !   identical.  If linked against ACML 4.4.0, then these execution
        !   times are equal regardless of the presence or absence of the
        !   call to dgels().
        !
        CALL bldfrmtx(nn, nn, Q)
        CALL dsyev('V', 'U', nn, Q, nn, eig, ewrk, lewrk, info)
        IF (info /= 0) STOP "Fail"

       

        RETURN

       

      CONTAINS

       

        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        !
        ! Build a full rank matrix of size m-by-n.
        !
        !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        SUBROUTINE bldfrmtx(m, n, A)
          IMPLICIT NONE

       

          INTEGER*4, INTENT(IN)  :: m, n
          REAL*8,    INTENT(OUT) :: A(m,n)

       

          INTEGER*4 :: i

       

          A(:,:) = 0.d0
          DO i=1,MIN(m,n)
             A(i,i) = 1.d0
          END DO

       

          RETURN

       

        END SUBROUTINE bldfrmtx

       

      END PROGRAM dgelsfun

       

      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
      !
      ! END: dgelsfun.f90
      !
      !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!