AnsweredAssumed Answered

clAmdBlasSgemv Error Compared to CPU Results

Question asked by samthesaab on Mar 4, 2013
Latest reply on Mar 5, 2013 by himanshu.gautam

I am trying to run some test cases on the clAmdBlasSgemv and clAmdBlasDgemv routines and compare them to standard cpu blas routines and I am getting a slight difference in the answers. Any ideas why there is a slight error?

 

Attached is the single precision code I wrote. I used the example program provided in the clAmdBlas sample directory and modified it to run a range of matrix sizes and compare gpu answer to the cpu blas result.

 

My system has the following specs.

amd-driver-installer-catalyst-13.2-beta6-linux-x86.x86_64

AMD-APP-SDK-v2.8-lnx64

clAmdBlas-1.8.291

Linux nusselt 3.4.34 #4 SMP Fri Mar 1 23:30:31 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

3x AMD Radeon HD 7900 Series  

 

Here are the results of the code. The code assigns a random matrix and vector. I am only comparing the first element of the output vector.

nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

 

  M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

======================================================================

   64    64   1.411686e+01       1.328439e+01      6.266490e+00

   71    71   3.106447e+01       2.828872e+01      9.812211e+00

   79    79   4.925335e+01       4.468921e+01      1.021307e+01

   88    88   6.938486e+01       6.311882e+01      9.927360e+00

   97    97   9.144693e+01       8.144869e+01      1.227550e+01

  107   107   1.166902e+02       1.032968e+02      1.296601e+01

  118   118   1.448911e+02       1.272231e+02      1.388736e+01

  130   130   1.764819e+02       1.583401e+02      1.145749e+01

  144   144   2.113761e+02       1.929213e+02      9.565993e+00

  159   159   2.497652e+02       2.324539e+02      7.447187e+00

  176   176   2.925247e+02       2.744401e+02      6.589621e+00

  194   194   3.394406e+02       3.219393e+02      5.436186e+00

  214   214   3.910286e+02       3.737681e+02      4.617962e+00

  236   236   4.464761e+02       4.299068e+02      3.854162e+00

  260   260   5.064575e+02       4.916926e+02      3.002871e+00

  287   287   5.724889e+02       5.609023e+02      2.065709e+00

  316   316   6.502740e+02       6.369617e+02      2.089981e+00

  348   348   7.394334e+02       7.256129e+02      1.904665e+00

  383   383   8.319628e+02       8.213775e+02      1.288721e+00

  422   422   9.353116e+02       9.255268e+02      1.057220e+00

  465   465   1.042261e+03       1.041738e+03      5.017624e-02

  512   512   1.165649e+03       1.167591e+03      1.663059e-01

  564   564   1.301569e+03       1.302999e+03      1.097603e-01

  621   621   1.453288e+03       1.458167e+03      3.346000e-01

  684   684   1.624285e+03       1.633837e+03      5.845990e-01

  753   753   1.812988e+03       1.821587e+03      4.720609e-01

  829   829   2.020191e+03       2.025767e+03      2.752382e-01

  913   913   2.247140e+03       2.244898e+03      9.987926e-02

1005  1005   2.492113e+03       2.492683e+03      2.285988e-02

1106  1106   2.761114e+03       2.760242e+03      3.159398e-02

1217  1217   3.048123e+03       3.065077e+03      5.531300e-01

1339  1339   3.363256e+03       3.400057e+03      1.082365e+00

1474  1474   3.713925e+03       3.762384e+03      1.288006e+00

1622  1622   4.103960e+03       4.161901e+03      1.392174e+00

1785  1785   4.543989e+03       4.602355e+03      1.268171e+00

1964  1964   5.033531e+03       5.088098e+03      1.072432e+00

2161  2161   5.571005e+03       5.626964e+03      9.944705e-01

2378  2378   6.160100e+03       6.224594e+03      1.036126e+00

2616  2616   6.809108e+03       6.874081e+03      9.451832e-01

2878  2878   7.523969e+03       7.579605e+03      7.340254e-01

3166  3166   8.312679e+03       8.349850e+03      4.451685e-01

3483  3483   9.177878e+03       9.232830e+03      5.951821e-01

3832  3832   1.012808e+04       1.019026e+04      6.101776e-01

4216  4216   1.118314e+04       1.124070e+04      5.120725e-01

4638  4638   1.235017e+04       1.238841e+04      3.086617e-01

5102  5102   1.364414e+04       1.369116e+04      3.434441e-01

5613  5613   1.506476e+04       1.510302e+04      2.533318e-01

6175  6175   1.661373e+04       1.662702e+04      7.991275e-02

6793  6793   1.833668e+04       1.833816e+04      8.083808e-03

7473  7473   2.023177e+04       2.017815e+04      2.657093e-01

8221  8221   2.229929e+04       2.220930e+04      4.052005e-01

9044  9044   2.456117e+04       2.442957e+04      5.386818e-01

9949  9949   2.703839e+04       2.691752e+04      4.490497e-01

10945 10945   2.974846e+04       2.965196e+04      3.254555e-01

12040 12040   3.270365e+04       3.265572e+04      1.467608e-01

13245 13245   3.597929e+04       3.594607e+04      9.242361e-02

 

Here is a second try:

nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

 

  M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

======================================================================

   64    64   1.411686e+01       1.328439e+01      6.266490e+00

   71    71   3.106447e+01       2.828872e+01      9.812211e+00

   79    79   4.925335e+01       4.468921e+01      1.021307e+01

   88    88   6.938486e+01       6.311882e+01      9.927360e+00

   97    97   9.144693e+01       8.144869e+01      1.227550e+01

  107   107   1.166902e+02       1.032968e+02      1.296601e+01

  118   118   1.448911e+02       1.272231e+02      1.388736e+01

  130   130   1.764819e+02       1.583401e+02      1.145749e+01

  144   144   2.113761e+02       1.929213e+02      9.565993e+00

  159   159   2.497652e+02       2.324539e+02      7.447187e+00

  176   176   2.925247e+02       2.744401e+02      6.589621e+00

  194   194   3.394406e+02       3.219393e+02      5.436186e+00

  214   214   3.910286e+02       3.737681e+02      4.617962e+00

  236   236   4.464761e+02       4.299068e+02      3.854162e+00

  260   260   5.064575e+02       4.916926e+02      3.002871e+00

  287   287   5.724889e+02       5.609023e+02      2.065709e+00

  316   316   6.502740e+02       6.369617e+02      2.089981e+00

  348   348   7.394334e+02       7.256129e+02      1.904665e+00

  383   383   8.319628e+02       8.213775e+02      1.288721e+00

  422   422   9.353116e+02       9.255268e+02      1.057220e+00

  465   465   1.042261e+03       1.041738e+03      5.017624e-02

  512   512   1.165649e+03       1.167591e+03      1.663059e-01

  564   564   1.301569e+03       1.302999e+03      1.097603e-01

  621   621   1.453288e+03       1.458167e+03      3.346000e-01

  684   684   1.624285e+03       1.633837e+03      5.845990e-01

  753   753   1.812988e+03       1.821587e+03      4.720609e-01

  829   829   2.020191e+03       2.025767e+03      2.752382e-01

  913   913   2.247140e+03       2.244898e+03      9.987926e-02

1005  1005   2.492113e+03       2.492683e+03      2.285988e-02

1106  1106   2.761114e+03       2.760242e+03      3.159398e-02

1217  1217   3.048123e+03       3.065077e+03      5.531300e-01

1339  1339   3.363256e+03       3.400057e+03      1.082365e+00

1474  1474   3.713925e+03       3.762384e+03      1.288006e+00

1622  1622   4.103960e+03       4.161901e+03      1.392174e+00

1785  1785   4.543989e+03       4.602355e+03      1.268171e+00

1964  1964   5.033531e+03       5.088098e+03      1.072432e+00

2161  2161   5.571005e+03       5.626964e+03      9.944705e-01

2378  2378   6.160100e+03       6.224594e+03      1.036126e+00

2616  2616   6.809108e+03       6.874081e+03      9.451832e-01

2878  2878   7.523969e+03       7.579605e+03      7.340254e-01

3166  3166   8.312679e+03       8.349850e+03      4.451685e-01

3483  3483   9.177878e+03       9.232830e+03      5.951821e-01

3832  3832   1.012808e+04       1.019026e+04      6.101776e-01

4216  4216   1.118314e+04       1.124070e+04      5.120725e-01

4638  4638   1.235017e+04       1.238841e+04      3.086617e-01

5102  5102   1.364414e+04       1.369116e+04      3.434441e-01

5613  5613   1.506476e+04       1.510302e+04      2.533318e-01

6175  6175   1.661373e+04       1.662702e+04      7.991275e-02

6793  6793   1.833668e+04       1.833816e+04      8.083808e-03

7473  7473   2.023177e+04       2.017815e+04      2.657093e-01

8221  8221   2.229929e+04       2.220930e+04      4.052005e-01

9044  9044   2.456117e+04       2.442957e+04      5.386818e-01

9949  9949   2.703839e+04       2.691752e+04      4.490497e-01

10945 10945   2.974846e+04       2.965196e+04      3.254555e-01

12040 12040   3.270365e+04       3.265572e+04      1.467608e-01

13245 13245   3.597929e+04       3.594607e+04      9.242361e-02

Attachments

Outcomes