3 Replies Latest reply on Sep 27, 2013 3:37 PM by mrip

    ACML version 5.3.1: dsyev slows down dramatically after dpotrf

    mrip

      Running ACML version 5.3.1, libacml_mp using fma4 on Opteron 6348 processors, Ubuntu 12.04.

       

      Calls to dsyev (eigen decomposition) slow down dramatically -- by a factor of 10 -- after calling dpotrf (cholesky decomposition).

      Here is a simple C program that reproduces the problem:

       

      #include <stdio.h>

      #include <stdlib.h>

      #include <acml.h>

      #include <time.h>

       

      int main(void) {

        double * x = malloc(1000000 * sizeof(double));

        double * y = malloc(1000000 * sizeof(double));

        double * eig0 = malloc(1000000 * sizeof(double));

        double * eig1 = malloc(1000000 * sizeof(double));

        double * eigw = malloc(1000 * sizeof(double));

        double * chol = malloc(1000000 * sizeof(double));

       

        clock_t t0,t1;

        int info;

        int i;

       

        // generate a random matrix

        for(i = 0; i<1000000; ++i){

          x[i] = rand() / (double) RAND_MAX;

        }


        // compute y = xx^T so that y is symmetric positive definite

        dgemm('N','T',1000,1000,1000,1,x,1000,x,1000,0,y,1000);

       

        // make a copy of y for cholesky and eigen decompositions

        for(i = 0; i<1000000; ++i){

          chol[i] = y[i];

          eig0[i] = y[i];

          eig1[i] = y[i];

        }

       

        // first eigenvalue test

        t0 = clock();

        dsyev('V','U',1000,eig0,1000,eigw,&info);

        t1 = clock();

        printf("Eigen decomposition time: %d\n", (t1-t0)/1000);

       

        // cholesky

        dpotrf('U',1000,chol,1000,&info);

       

        // second eigenvalue test, after cholesky

        t0 = clock();

        dsyev('V','U',1000,eig1,1000,eigw,&info);

        t1 = clock();

        printf("Eigen decomposition time: %d\n", (t1-t0)/1000);

      }

       

      Here is the output:

      Eigen decomposition time: 8120

      Eigen decomposition time: 95140

       

      If I comment out the dpotrf line, then it works fine:

      Eigen decomposition time: 8150

      Eigen decomposition time: 8210

       

      This seems like some kind of a bug.  Am I missing something?  Is there some kind of cache that I need to clear?

       

      Thanks.

       

      On edit: I get the same behavior whether I link against the gfortran64_mp or the gfortran64_fma4_mp versions of libacml_mp.

        • Re: ACML version 5.3.1: dsyev slows down dramatically after dpotrf
          timmy.liu

          Hi,

           

          Sorry for the late response. I tested your code in a Phenom and I could see consistent performance results linking to gfortran64_mp. We recently have to move our lab so it might take a few more days before I can run the test on Opteron. I will let you know as soon as possible.

           

          Thanks,

          Timmy

            • Re: Re: ACML version 5.3.1: dsyev slows down dramatically after dpotrf
              mrip

              Hi.  Thanks for the reply.  A little more info in case it might be useful.  First, I have 4 processors, so total of 12x4=48 cores.  Not sure if that makes any difference.

               

              I tried compiling and running the same code on a machine with a single Intel Core i3 processor (2 cores), and there was no slowdown at all, and since you couldn't reproduce on Phenom, maybe it is a specific issue with Opterons.

               

              Back on Opterons, if I link against the single threaded version of the library (gfortran64), I don't get any slowdown.  If I link against acml version 5.3.0 gfortran_mp, then the problem seems to go away, mostly.  However, there are still some strange inconsistencies in the performance of dsyev, although they aren't as severe (occasional calls slow down by maybe 2x or 3x), and also they don't seem to be predictable, so I can't give you a simple main that reproduces the issue on 5.3.0.  Also, with 5.3.0, performance of dsyev is inconsistent even without dpotrf, and I can't determine whether dpotrf makes a difference.

               

              Also, the degradation of performance is actually worse than 10x.  I first found this problem running a version of R linked against acml.  The slowdown in that sample program only counts processor time, but the second call to dsyev seems to not use almost any parallel processing at all, so the actual slowdown is on the order of 100x.  In R, it looks like this:

              > x<-array(rnorm(1e6),c(1000,1000))  ## random 1000x1000 matrix

              > x<-crossprod(x) ## make x symmetric positive definite by setting x=x^Tx

              > system.time(eigen(x))  ## eigen decomposition

                user  system elapsed

                7.732  0.684  0.795

              > system.time(chol(x))   ## cholesky decomposition

                user  system elapsed

                0.076  0.012  0.025

              > system.time(eigen(x))  ## eigen again

                user  system elapsed

              84.764  0.656  79.449

              In fact, dsyev slows down so much that even the eigen decomposition of a 10x10 matrix takes 5 seconds:

              > y<-crossprod(array(rnorm(100),c(10,10)))

              > system.time(eigen(y))

                 user  system elapsed

                5.336   0.028   4.971

               

              Finally, I verified that the actual result of dsyev does not change after dpotrf.  That is, the output of the first (fast) eigen decomposition is identical to the output of the second (slow) eigen decomposition.  The same thing occurs with version 5.3.0: even though the running time of dsyev is inconsistent and can vary by factors of up to about 3, the output is always (so far as I tested) exactly identical.