cancel
Showing results for 
Search instead for 
Did you mean: 

Archives Discussions

samthesaab
Journeyman III

clAmdBlasSgemv Error Compared to CPU Results

I am trying to run some test cases on the clAmdBlasSgemv and clAmdBlasDgemv routines and compare them to standard cpu blas routines and I am getting a slight difference in the answers. Any ideas why there is a slight error?

Attached is the single precision code I wrote. I used the example program provided in the clAmdBlas sample directory and modified it to run a range of matrix sizes and compare gpu answer to the cpu blas result.

My system has the following specs.

amd-driver-installer-catalyst-13.2-beta6-linux-x86.x86_64

AMD-APP-SDK-v2.8-lnx64

clAmdBlas-1.8.291

Linux nusselt 3.4.34 #4 SMP Fri Mar 1 23:30:31 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

3x AMD Radeon HD 7900 Series  

Here are the results of the code. The code assigns a random matrix and vector. I am only comparing the first element of the output vector.

nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

  M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

======================================================================

   64    64   1.411686e+01       1.328439e+01      6.266490e+00

   71    71   3.106447e+01       2.828872e+01      9.812211e+00

   79    79   4.925335e+01       4.468921e+01      1.021307e+01

   88    88   6.938486e+01       6.311882e+01      9.927360e+00

   97    97   9.144693e+01       8.144869e+01      1.227550e+01

  107   107   1.166902e+02       1.032968e+02      1.296601e+01

  118   118   1.448911e+02       1.272231e+02      1.388736e+01

  130   130   1.764819e+02       1.583401e+02      1.145749e+01

  144   144   2.113761e+02       1.929213e+02      9.565993e+00

  159   159   2.497652e+02       2.324539e+02      7.447187e+00

  176   176   2.925247e+02       2.744401e+02      6.589621e+00

  194   194   3.394406e+02       3.219393e+02      5.436186e+00

  214   214   3.910286e+02       3.737681e+02      4.617962e+00

  236   236   4.464761e+02       4.299068e+02      3.854162e+00

  260   260   5.064575e+02       4.916926e+02      3.002871e+00

  287   287   5.724889e+02       5.609023e+02      2.065709e+00

  316   316   6.502740e+02       6.369617e+02      2.089981e+00

  348   348   7.394334e+02       7.256129e+02      1.904665e+00

  383   383   8.319628e+02       8.213775e+02      1.288721e+00

  422   422   9.353116e+02       9.255268e+02      1.057220e+00

  465   465   1.042261e+03       1.041738e+03      5.017624e-02

  512   512   1.165649e+03       1.167591e+03      1.663059e-01

  564   564   1.301569e+03       1.302999e+03      1.097603e-01

  621   621   1.453288e+03       1.458167e+03      3.346000e-01

  684   684   1.624285e+03       1.633837e+03      5.845990e-01

  753   753   1.812988e+03       1.821587e+03      4.720609e-01

  829   829   2.020191e+03       2.025767e+03      2.752382e-01

  913   913   2.247140e+03       2.244898e+03      9.987926e-02

1005  1005   2.492113e+03       2.492683e+03      2.285988e-02

1106  1106   2.761114e+03       2.760242e+03      3.159398e-02

1217  1217   3.048123e+03       3.065077e+03      5.531300e-01

1339  1339   3.363256e+03       3.400057e+03      1.082365e+00

1474  1474   3.713925e+03       3.762384e+03      1.288006e+00

1622  1622   4.103960e+03       4.161901e+03      1.392174e+00

1785  1785   4.543989e+03       4.602355e+03      1.268171e+00

1964  1964   5.033531e+03       5.088098e+03      1.072432e+00

2161  2161   5.571005e+03       5.626964e+03      9.944705e-01

2378  2378   6.160100e+03       6.224594e+03      1.036126e+00

2616  2616   6.809108e+03       6.874081e+03      9.451832e-01

2878  2878   7.523969e+03       7.579605e+03      7.340254e-01

3166  3166   8.312679e+03       8.349850e+03      4.451685e-01

3483  3483   9.177878e+03       9.232830e+03      5.951821e-01

3832  3832   1.012808e+04       1.019026e+04      6.101776e-01

4216  4216   1.118314e+04       1.124070e+04      5.120725e-01

4638  4638   1.235017e+04       1.238841e+04      3.086617e-01

5102  5102   1.364414e+04       1.369116e+04      3.434441e-01

5613  5613   1.506476e+04       1.510302e+04      2.533318e-01

6175  6175   1.661373e+04       1.662702e+04      7.991275e-02

6793  6793   1.833668e+04       1.833816e+04      8.083808e-03

7473  7473   2.023177e+04       2.017815e+04      2.657093e-01

8221  8221   2.229929e+04       2.220930e+04      4.052005e-01

9044  9044   2.456117e+04       2.442957e+04      5.386818e-01

9949  9949   2.703839e+04       2.691752e+04      4.490497e-01

10945 10945   2.974846e+04       2.965196e+04      3.254555e-01

12040 12040   3.270365e+04       3.265572e+04      1.467608e-01

13245 13245   3.597929e+04       3.594607e+04      9.242361e-02

Here is a second try:

nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

 

  M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

======================================================================

   64    64   1.411686e+01       1.328439e+01      6.266490e+00

   71    71   3.106447e+01       2.828872e+01      9.812211e+00

   79    79   4.925335e+01       4.468921e+01      1.021307e+01

   88    88   6.938486e+01       6.311882e+01      9.927360e+00

   97    97   9.144693e+01       8.144869e+01      1.227550e+01

  107   107   1.166902e+02       1.032968e+02      1.296601e+01

  118   118   1.448911e+02       1.272231e+02      1.388736e+01

  130   130   1.764819e+02       1.583401e+02      1.145749e+01

  144   144   2.113761e+02       1.929213e+02      9.565993e+00

  159   159   2.497652e+02       2.324539e+02      7.447187e+00

  176   176   2.925247e+02       2.744401e+02      6.589621e+00

  194   194   3.394406e+02       3.219393e+02      5.436186e+00

  214   214   3.910286e+02       3.737681e+02      4.617962e+00

  236   236   4.464761e+02       4.299068e+02      3.854162e+00

  260   260   5.064575e+02       4.916926e+02      3.002871e+00

  287   287   5.724889e+02       5.609023e+02      2.065709e+00

  316   316   6.502740e+02       6.369617e+02      2.089981e+00

  348   348   7.394334e+02       7.256129e+02      1.904665e+00

  383   383   8.319628e+02       8.213775e+02      1.288721e+00

  422   422   9.353116e+02       9.255268e+02      1.057220e+00

  465   465   1.042261e+03       1.041738e+03      5.017624e-02

  512   512   1.165649e+03       1.167591e+03      1.663059e-01

  564   564   1.301569e+03       1.302999e+03      1.097603e-01

  621   621   1.453288e+03       1.458167e+03      3.346000e-01

  684   684   1.624285e+03       1.633837e+03      5.845990e-01

  753   753   1.812988e+03       1.821587e+03      4.720609e-01

  829   829   2.020191e+03       2.025767e+03      2.752382e-01

  913   913   2.247140e+03       2.244898e+03      9.987926e-02

1005  1005   2.492113e+03       2.492683e+03      2.285988e-02

1106  1106   2.761114e+03       2.760242e+03      3.159398e-02

1217  1217   3.048123e+03       3.065077e+03      5.531300e-01

1339  1339   3.363256e+03       3.400057e+03      1.082365e+00

1474  1474   3.713925e+03       3.762384e+03      1.288006e+00

1622  1622   4.103960e+03       4.161901e+03      1.392174e+00

1785  1785   4.543989e+03       4.602355e+03      1.268171e+00

1964  1964   5.033531e+03       5.088098e+03      1.072432e+00

2161  2161   5.571005e+03       5.626964e+03      9.944705e-01

2378  2378   6.160100e+03       6.224594e+03      1.036126e+00

2616  2616   6.809108e+03       6.874081e+03      9.451832e-01

2878  2878   7.523969e+03       7.579605e+03      7.340254e-01

3166  3166   8.312679e+03       8.349850e+03      4.451685e-01

3483  3483   9.177878e+03       9.232830e+03      5.951821e-01

3832  3832   1.012808e+04       1.019026e+04      6.101776e-01

4216  4216   1.118314e+04       1.124070e+04      5.120725e-01

4638  4638   1.235017e+04       1.238841e+04      3.086617e-01

5102  5102   1.364414e+04       1.369116e+04      3.434441e-01

5613  5613   1.506476e+04       1.510302e+04      2.533318e-01

6175  6175   1.661373e+04       1.662702e+04      7.991275e-02

6793  6793   1.833668e+04       1.833816e+04      8.083808e-03

7473  7473   2.023177e+04       2.017815e+04      2.657093e-01

8221  8221   2.229929e+04       2.220930e+04      4.052005e-01

9044  9044   2.456117e+04       2.442957e+04      5.386818e-01

9949  9949   2.703839e+04       2.691752e+04      4.490497e-01

10945 10945   2.974846e+04       2.965196e+04      3.254555e-01

12040 12040   3.270365e+04       3.265572e+04      1.467608e-01

13245 13245   3.597929e+04       3.594607e+04      9.242361e-02

0 Likes
2 Replies
samthesaab
Journeyman III

I think I solved my own problem. The blas routines are column major. So, I should set

static const clAmdBlasOrder order = clAmdBlasColumnMajor; //clAmdBlasRowMajor;

nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

  M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

======================================================================

   64    64   2.194825e+01       2.194822e+01      1.303533e-04

   71    71   3.695258e+01       3.695256e+01      6.193938e-05

   79    79   5.335306e+01       5.335305e+01      2.144974e-05

   88    88   7.178266e+01       7.178265e+01      2.125693e-05

   97    97   9.011253e+01       9.011250e+01      2.539956e-05

  107   107   1.119606e+02       1.119606e+02      2.044307e-05

  118   118   1.358870e+02       1.358869e+02      2.245806e-05

  130   130   1.670040e+02       1.670040e+02      9.136783e-06

  144   144   2.015851e+02       2.015851e+02      0.000000e+00

  159   159   2.411177e+02       2.411178e+02      6.328356e-06

  176   176   2.831040e+02       2.831039e+02      2.155928e-05

  194   194   3.306032e+02       3.306031e+02      3.692352e-05

  214   214   3.824321e+02       3.824319e+02      6.383898e-05

  236   236   4.385708e+02       4.385706e+02      4.870893e-05

  260   260   5.003567e+02       5.003564e+02      6.709086e-05

  287   287   5.695662e+02       5.695659e+02      5.358041e-05

  316   316   6.456257e+02       6.456253e+02      6.617555e-05

  348   348   7.342762e+02       7.342766e+02      4.987371e-05

  383   383   8.300407e+02       8.300411e+02      5.147288e-05

  422   422   9.341904e+02       9.341904e+02      6.533481e-06

  465   465   1.050402e+03       1.050402e+03      6.972775e-05

  512   512   1.176255e+03       1.176255e+03      1.037788e-05

  564   564   1.311663e+03       1.311663e+03      9.306530e-06

  621   621   1.466831e+03       1.466832e+03      4.993224e-05

  684   684   1.642501e+03       1.642501e+03      7.431980e-06

  753   753   1.830251e+03       1.830251e+03      6.669595e-06

  829   829   2.034431e+03       2.034431e+03      6.000220e-06

  913   913   2.253564e+03       2.253562e+03      1.191690e-04

1005  1005   2.501351e+03       2.501346e+03      1.952074e-04

1106  1106   2.768912e+03       2.768905e+03      2.468823e-04

1217  1217   3.073746e+03       3.073740e+03      1.906269e-04

1339  1339   3.408728e+03       3.408721e+03      2.005426e-04

1474  1474   3.771056e+03       3.771048e+03      2.071705e-04

1622  1622   4.170569e+03       4.170564e+03      1.170780e-04

1785  1785   4.611025e+03       4.611019e+03      1.376628e-04

1964  1964   5.096781e+03       5.096761e+03      3.927893e-04

2161  2161   5.635651e+03       5.635627e+03      4.245451e-04

2378  2378   6.233284e+03       6.233258e+03      4.151746e-04

2616  2616   6.882771e+03       6.882744e+03      3.972798e-04

2878  2878   7.588303e+03       7.588269e+03      4.439933e-04

3166  3166   8.358554e+03       8.358509e+03      5.374389e-04

3483  3483   9.241557e+03       9.241489e+03      7.291337e-04

3832  3832   1.019895e+04       1.019892e+04      2.776794e-04

4216  4216   1.124939e+04       1.124936e+04      2.777935e-04

4638  4638   1.239710e+04       1.239707e+04      1.890567e-04

5102  5102   1.369983e+04       1.369982e+04      7.128287e-05

5613  5613   1.511168e+04       1.511168e+04      1.938691e-05

6175  6175   1.663567e+04       1.663567e+04      2.348116e-05

6793  6793   1.834676e+04       1.834681e+04      3.087219e-04

7473  7473   2.018678e+04       2.018681e+04      1.548041e-04

8221  8221   2.221797e+04       2.221795e+04      8.790752e-05

9044  9044   2.443827e+04       2.443822e+04      2.077944e-04

9949  9949   2.692621e+04       2.692617e+04      1.595799e-04

10945 10945   2.966064e+04       2.966061e+04      1.251133e-04

12040 12040   3.266446e+04       3.266437e+04      2.810306e-04

13245 13245   3.595488e+04       3.595472e+04      4.345743e-04

0 Likes

Good that you found this out...

But still you can expect slight deviations because floating point results depend on the order in which operations are carried out....Since parallel breakdown changes this order, slight deviations are expected. FYI

0 Likes