AnsweredAssumed Answered

ACML version 5.3.1: dsyev slows down dramatically after dpotrf

Question asked by mrip on Sep 20, 2013
Latest reply on Sep 27, 2013 by mrip

Running ACML version 5.3.1, libacml_mp using fma4 on Opteron 6348 processors, Ubuntu 12.04.

 

Calls to dsyev (eigen decomposition) slow down dramatically -- by a factor of 10 -- after calling dpotrf (cholesky decomposition).

Here is a simple C program that reproduces the problem:

 

#include <stdio.h>

#include <stdlib.h>

#include <acml.h>

#include <time.h>

 

int main(void) {

  double * x = malloc(1000000 * sizeof(double));

  double * y = malloc(1000000 * sizeof(double));

  double * eig0 = malloc(1000000 * sizeof(double));

  double * eig1 = malloc(1000000 * sizeof(double));

  double * eigw = malloc(1000 * sizeof(double));

  double * chol = malloc(1000000 * sizeof(double));

 

  clock_t t0,t1;

  int info;

  int i;

 

  // generate a random matrix

  for(i = 0; i<1000000; ++i){

    x[i] = rand() / (double) RAND_MAX;

  }


  // compute y = xx^T so that y is symmetric positive definite

  dgemm('N','T',1000,1000,1000,1,x,1000,x,1000,0,y,1000);

 

  // make a copy of y for cholesky and eigen decompositions

  for(i = 0; i<1000000; ++i){

    chol[i] = y[i];

    eig0[i] = y[i];

    eig1[i] = y[i];

  }

 

  // first eigenvalue test

  t0 = clock();

  dsyev('V','U',1000,eig0,1000,eigw,&info);

  t1 = clock();

  printf("Eigen decomposition time: %d\n", (t1-t0)/1000);

 

  // cholesky

  dpotrf('U',1000,chol,1000,&info);

 

  // second eigenvalue test, after cholesky

  t0 = clock();

  dsyev('V','U',1000,eig1,1000,eigw,&info);

  t1 = clock();

  printf("Eigen decomposition time: %d\n", (t1-t0)/1000);

}

 

Here is the output:

Eigen decomposition time: 8120

Eigen decomposition time: 95140

 

If I comment out the dpotrf line, then it works fine:

Eigen decomposition time: 8150

Eigen decomposition time: 8210

 

This seems like some kind of a bug.  Am I missing something?  Is there some kind of cache that I need to clear?

 

Thanks.

 

On edit: I get the same behavior whether I link against the gfortran64_mp or the gfortran64_fma4_mp versions of libacml_mp.

Outcomes