5 Replies Latest reply on Jun 6, 2013 11:36 AM by chipf

    dposv gives different answers based on the alignment of the memory of the arrays.

    nickr523

      I am wondering if you would expect dposv to give different answers based on the alignment of the memory of the arrays passed to it.  I am seeing differences when the arrays are 8 byte aligned versus 16 byte aligned.  I am only seeing this difference when I run on an AMD 6344 processor and only with the gfortran64 library.  gfortran64_fma4 produces stable answers independent of memory alignment.  I am using acml 5.3.1.  I have tried this on a Centos 6.4 and an Ubuntu system with an AMD 6344 processors and saw the same stability issue.  I have tried running the same executable on several different older AMD and Intel processors and get stable results.

       

      Is this a bug?  Is there someway to control dposv to give stable answers independent of memory alignment?

       

      Thanks for your help,

       

      /Nick

        • Re: dposv gives different answers based on the alignment of the memory of the arrays.
          chipf

          I would not expect this result.  Can you supply a short example of the call you make, with data sizes and the other parameters specified?  We'll have to work up a test to attempt to duplictate the issue.

           

          Does threading ake any difference?

            • Re: dposv gives different answers based on the alignment of the memory of the arrays.
              nickr523

              Hi Chip,

               

              I have created a small c++ test case which shows the problem.  Below is the code.

               

              Thanks,

               

              /Nick

               

               

              #include <vector>

              #include <iostream>

              #include <string.h>

              #include "acml.h"

               

              using namespace std;

               

              ///////////////////////////////////////////////////////////////////////////////

              ///////////////////////////////////////////////////////////////////////////////

              static void load_arrays(int &N,

                          int &M,

                          vector<double> &tmpMat,

                          vector<double> &x,

                          int &ld_y)

              {

                tmpMat.clear();

                x.clear();

                tmpMat.push_back(0.48661195805418644422);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0092197191960041984665);

                tmpMat.push_back(-0.0058509282487692829827);

                tmpMat.push_back(0.010248259761012143074);

                tmpMat.push_back(0.015737552086924917055);

                tmpMat.push_back(0);

                tmpMat.push_back(1000.3448247270006277);

                tmpMat.push_back(-0.0019502433500000002781);

                tmpMat.push_back(-0.00029092731659999997509);

                tmpMat.push_back(-0.0015220754829999999253);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.011053156799012624906);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0012220956920000001361);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-1000);

                tmpMat.push_back(-0.30621568356239353692);

                tmpMat.push_back(-0.00059925961769999997169);

                tmpMat.push_back(-0.00054638867079999993859);

                tmpMat.push_back(-0.00021986270149999996505);

                tmpMat.push_back(-0.00046067169830000002581);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0019502433500000002781);

                tmpMat.push_back(1010.7353346993039622);

                tmpMat.push_back(-0.0026527929540000002619);

                tmpMat.push_back(-0.0025847757529999996866);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0027966937869999997264);

                tmpMat.push_back(-0.0021671577309999999657);

                tmpMat.push_back(-5.3291354724234070162);

                tmpMat.push_back(-5.3928699124989867286);

                tmpMat.push_back(-0.000633963606299999872);

                tmpMat.push_back(-0.00054368719919999997439);

                tmpMat.push_back(0);

                tmpMat.push_back(-277.35009811261454615);

                tmpMat.push_back(584.92724146779710281);

                tmpMat.push_back(-326.40455834145899416);

                tmpMat.push_back(360.09708837623895761);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00029092731659999997509);

                tmpMat.push_back(-0.0026527929540000002619);

                tmpMat.push_back(1.4226475298305729655);

                tmpMat.push_back(-0.0052590888670000007429);

                tmpMat.push_back(-0.02667308283050820461);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0001089282938999999972);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0019184374669999999189);

                tmpMat.push_back(-0.00015591433429999997985);

                tmpMat.push_back(-0.00083504797259999988951);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.37811046038376505685);

                tmpMat.push_back(-0.00014279830339999999918);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0015220754829999999253);

                tmpMat.push_back(-0.0025847757529999996866);

                tmpMat.push_back(-0.0052590888670000007429);

                tmpMat.push_back(1.4983298341396003028);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.026983125344459717859);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00033412225260000000954);

                tmpMat.push_back(-0.00075922918470000002288);

                tmpMat.push_back(-0.00012293166689999999205);

                tmpMat.push_back(-0.00052180725380000000501);

                tmpMat.push_back(-0.0019773690229999999965);

                tmpMat.push_back(-0.3141558836619546824);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.02667308283050820461);

                tmpMat.push_back(0);

                tmpMat.push_back(0.060611667334565799692);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0013066319729999999535);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.015871843010221438341);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.016760109519836155489);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.011053156799012624906);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0.011053156800012625335);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(1000.0298451488444016);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0036431166019999998255);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0096674305781100505763);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.016543601663148714553);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0012220956920000001361);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0001089282938999999972);

                tmpMat.push_back(-0.026983125344459717859);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(14000.028662056060057);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(-0.00027528183419999998011);

                tmpMat.push_back(-0.0001986248876999999885);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-999.99999100000013641);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(999.99999100000115959);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-1000);

                tmpMat.push_back(-0.0027966937869999997264);

                tmpMat.push_back(-0.0019184374669999999189);

                tmpMat.push_back(-0.00033412225260000000954);

                tmpMat.push_back(-0.0013066319729999999535);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00027528183419999998011);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(1000.0566179472461954);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.024184262621089257922);

                tmpMat.push_back(-0.0019960532849999999654);

                tmpMat.push_back(-0.023060810955854587484);

                tmpMat.push_back(-0.00074565306950000003177);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.30621568356239353692);

                tmpMat.push_back(-0.0021671577309999999657);

                tmpMat.push_back(-0.00015591433429999997985);

                tmpMat.push_back(-0.00075922918470000002288);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0036431166019999998255);

                tmpMat.push_back(-0.0001986248876999999885);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0.33113653125362019214);

                tmpMat.push_back(-0.00052257488960000004347);

                tmpMat.push_back(-0.0083605717738728796418);

                tmpMat.push_back(-0.00013627741459999997151);

                tmpMat.push_back(-0.0089773808724537568909);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00059925961769999997169);

                tmpMat.push_back(-5.3291354724234070162);

                tmpMat.push_back(-0.00083504797259999988951);

                tmpMat.push_back(-0.00012293166689999999205);

                tmpMat.push_back(-0.015871843010221438341);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.024184262621089257922);

                tmpMat.push_back(-0.00052257488960000004347);

                tmpMat.push_back(5.3718801755955176702);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00060878339299999994962);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00054638867079999993859);

                tmpMat.push_back(-5.3928699124989867286);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00052180725380000000501);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0096674305781100505763);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0019960532849999999654);

                tmpMat.push_back(-0.0083605717738728796418);

                tmpMat.push_back(0);

                tmpMat.push_back(5.4150206107131442224);

                tmpMat.push_back(-3.7549975749999998267e-06);

                tmpMat.push_back(-0.0010546916539999999507);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00021986270149999996505);

                tmpMat.push_back(-0.000633963606299999872);

                tmpMat.push_back(-0.37811046038376505685);

                tmpMat.push_back(-0.0019773690229999999965);

                tmpMat.push_back(-0.016760109519836155489);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.023060810955854587484);

                tmpMat.push_back(-0.00013627741459999997151);

                tmpMat.push_back(-0.00060878339299999994962);

                tmpMat.push_back(-3.7549975749999998267e-06);

                tmpMat.push_back(0.4218512460386307783);

                tmpMat.push_back(-0.00033985404220000000729);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00046067169830000002581);

                tmpMat.push_back(-0.00054368719919999997439);

                tmpMat.push_back(-0.00014279830339999999918);

                tmpMat.push_back(-0.3141558836619546824);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.016543601663148714553);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.00074565306950000003177);

                tmpMat.push_back(-0.0089773808724537568909);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.0010546916539999999507);

                tmpMat.push_back(-0.00033985404220000000729);

                tmpMat.push_back(0.34296422216515715098);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0.56650954492184457667);

                tmpMat.push_back(-0.012391597323698119379);

                tmpMat.push_back(-0.017338387540017579508);

                tmpMat.push_back(-0.022742406234882210636);

                tmpMat.push_back(0.0031164618298733779735);

                tmpMat.push_back(-0.0092197191960041984665);

                tmpMat.push_back(0);

                tmpMat.push_back(-277.35009811261454615);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.012391597323698119379);

                tmpMat.push_back(76.982750435796006627);

                tmpMat.push_back(-162.20954888224753176);

                tmpMat.push_back(90.529692639616811789);

                tmpMat.push_back(-99.854109879548900608);

                tmpMat.push_back(-0.0058509282487692829827);

                tmpMat.push_back(0);

                tmpMat.push_back(584.92724146779710281);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.017338387540017579508);

                tmpMat.push_back(-162.20954888224756019);

                tmpMat.push_back(342.18339070233224675);

                tmpMat.push_back(-190.8989302132812611);

                tmpMat.push_back(210.62274111233483609);

                tmpMat.push_back(0.010248259761012143074);

                tmpMat.push_back(0);

                tmpMat.push_back(-326.40455834145899416);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(-0.022742406234882210636);

                tmpMat.push_back(90.529692639616826);

                tmpMat.push_back(-190.8989302132812611);

                tmpMat.push_back(106.60832914501757784);

                tmpMat.push_back(-117.54529340644536717);

                tmpMat.push_back(0.015737552086924917055);

                tmpMat.push_back(0);

                tmpMat.push_back(360.09708837623895761);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0);

                tmpMat.push_back(0.0031164618298733779735);

                tmpMat.push_back(-99.854109879548914819);

                tmpMat.push_back(210.62274111233480767);

                tmpMat.push_back(-117.54529340644536717);

                tmpMat.push_back(129.74091356114340101);

                x.push_back(-4.833163326516659343e-09);

                x.push_back(8.3203109888329063767e-05);

                x.push_back(1.4624850340878986781e-12);

                x.push_back(8.3239100409004817599e-17);

                x.push_back(0);

                x.push_back(4.5252164891245906898e-11);

                x.push_back(0);

                x.push_back(3.3527001360103670688e-11);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(0);

                x.push_back(-2.2851566602218010691e-11);

                x.push_back(-3.6763442131705016109e-11);

                x.push_back(-2.2400220539693724943e-11);

                x.push_back(3.2366579593645017098e-12);

                x.push_back(-6.7484307379394405878e-09);

                x.push_back(-4.5402502067155604965e-13);

                x.push_back(8.1318651764357171588e-13);

                x.push_back(-4.7928431215081953678e-13);

                x.push_back(5.2881416852993767432e-13);

               

                N = 33;

                M = 1;

                ld_y = 33;

              }

               

              ////////////////////////////////////////////////////////////////////////////////

              ////////////////////////////////////////////////////////////////////////////////

              int main(int argc, char **argv)

              {

                int n = 11;

                int maxoffset = 263;

                int all_size = 1000000;

                vector<double> tmpMat(all_size);

                vector<double> x(all_size);

                int N, M, ld_y;

               

                cout.precision(20);

               

                load_arrays(N, M, tmpMat, x, ld_y);

               

                vector<double> all_mem1(all_size);

                vector<double> all_mem2(all_size);

                vector<double> all_mem_out_saved(all_size);

                double *saved = &all_mem_out_saved[0];

                for ( int offset = 0; offset < maxoffset; ++offset )

                {

                  double *input = reinterpret_cast<double *>(reinterpret_cast<char *>(&all_mem1[0]) + offset);

                  double *output = &all_mem2[0];

                  memcpy(input, &tmpMat[0], N*N*sizeof(double));

                  memcpy(output, &x[0], N*M*sizeof(double));

               

                  int info;

                  dposv('U', N, M, input, ld_y, output, ld_y, &info);

                  if ( info != 0 )

                  {

                    cout << " dposv failed, singular mat?\n";

                    return -1;

                  }

                  if ( offset == 0 )

                  {

                    // save

                    dcopy(n * n, output, 1, saved, 1);

                  }

                  else

                  {

                    // compare with offset 0

                    for ( int i = 0; i < n * n; ++i )

                    {

                  double dif = output[i] - saved[i];

                  if ( dif != 0.0 )

                  {

                    cout << " FOUND dif for offset " << offset

                         << " at " << i <<  "    " << output[i]

                         << "   " << saved[i] << "\n";

                  }

                    }

                  }

                }

                return 0;

              }

                • Re: dposv gives different answers based on the alignment of the memory of the arrays.
                  chipf

                  Sorry for the delayed response, we're using this as an excersize for our new engineer.

                  We were able to run the code.  At first we discovered it did not detect any mismatches.   But this first test was run on an AMD Phenom II.   We then reran on another machine with Abu Dhabi processors and have duplicated the issues you report in this and the new post about DGEMM.  There is an issue the the library, and it is machine dependent.  The issue we see is a precision failure (in the last 2 decimal digits) when the alignmen is an odd multiple of 8 bytes.

                   

                  The problem is most likely in DGEMM, which does check for FMA capability, and runs different code if FMA is available.  We have extensive testing but it did not catch this problem, we'll have to figure out why not and then add this as a test case (with your permission).

                   

                  We haven't root caused the problem yet, but it shouldn't take too much longer to find.  For now I can't recommend any work around that won't kill performance.  If performance were not an issue you could set ACML_FMA=0 and the code would use the SSE2 assembly code paths that work properly.