AnsweredAssumed Answered

dsyev slowdown after dgels in ACML 5.3.1

Question asked by lewisrd on Mar 19, 2014

I have observed surprising behavior when calling dsyev() after dgels() when using ACML 5.3.1.  A short example illustrating the issue follows this post.  In that example, I first call dsyev(), which takes about 0.05 seconds, then I call dgels(), and finally I call dsyev() again.  This second call to dsyev() takes about 45 seconds.  The arguments to both calls to dsyev() are the same, but the second call is about 900 times slower than the first call!  This performance disparity does not appear if the intervening call to dgels() is removed; nor does it appear if I link against ACML 4.4.0 instead of 5.3.1.

 

Here are some details of my computing environment:

Compiler: gfortran 4.8.2

ACML: 5.3.1/gfortran64_fma4_mp/lib/libacml_mp.a

CPUs: Dual Opteron 4334, 12 cores total

OS: Linux, x86_64, kernel version 2.6.32

Compile command: gfortran -m64 -fopenmp -o dgelsfun dgelsfun.f90 acml/5.3.1/gfortran64_fma4_mp/lib/libacml_mp.a

 

I would be happy to provide additional details of my computing environment if needed.

 

I would be very grateful if someone can explain what I am seeing -- perhaps by pointing out a bug in the code that follows! -- and more importantly, help me to resolve the issue.

 

Regards,

--Ryan

 

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! NAME: dgelsfun.f90
!
! DESC: A short example that exhibits unexpected behavior of dgels()
!   and dsyev() in ACML 5.3.1.
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
PROGRAM dgelsfun
  IMPLICIT NONE

 

  INTEGER*4, PARAMETER :: m=965, n=42, nn=450, nr=440, lewrk=10*nn, &
                          llswrk=10*m

 

  REAL*8    :: Q(nn,nn), ewrk(lewrk), eig(nn), A(m,nr), B(m,n), &
               lswrk(llswrk)
  INTEGER*4 :: info

 

  !
  ! Build an nn-by-nn real symmetric matrix Q and find its eigenstuff
  !
  ! This call to dsyev() takes 0.051 seconds on a typical 12 core
  ! machine
  !
  CALL bldfrmtx(nn, nn, Q)
  CALL dsyev('V', 'U', nn, Q, nn, eig, ewrk, lewrk, info)
  IF (info /= 0) STOP "Fail"

 

  !
  ! Solve an m-by-nr least squares problem AX=B with n right hand sides
  !
  CALL bldfrmtx(m, nr, A)       ! A has full rank
  B(:,:) = 0.d0
  CALL dgels('N', m, nr, n, A, m, B, m, lswrk, llswrk, info)
  IF (info /= 0) STOP "Fail"

 

  !
  ! Build an nn-by-nn real symmetric matrix Q and find its eigenstuff
  !
  ! This call to dsyev() takes 45.3 seconds on a typical 12 core
  ! machine
  !
  ! REMARK: This call to dsyev() is identical to the previous one, but
  !   it takes ~900 times longer to execute than that one when linked
  !   against ACML 5.3.1. If the above call to dgels() is commented
  !   out, then the execution times of both calls to dsyev() are
  !   identical.  If linked against ACML 4.4.0, then these execution
  !   times are equal regardless of the presence or absence of the
  !   call to dgels().
  !
  CALL bldfrmtx(nn, nn, Q)
  CALL dsyev('V', 'U', nn, Q, nn, eig, ewrk, lewrk, info)
  IF (info /= 0) STOP "Fail"

 

  RETURN

 

CONTAINS

 

  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  !
  ! Build a full rank matrix of size m-by-n.
  !
  !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
  SUBROUTINE bldfrmtx(m, n, A)
    IMPLICIT NONE

 

    INTEGER*4, INTENT(IN)  :: m, n
    REAL*8,    INTENT(OUT) :: A(m,n)

 

    INTEGER*4 :: i

 

    A(:,:) = 0.d0
    DO i=1,MIN(m,n)
       A(i,i) = 1.d0
    END DO

 

    RETURN

 

  END SUBROUTINE bldfrmtx

 

END PROGRAM dgelsfun

 

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!
! END: dgelsfun.f90
!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Outcomes