I am trying to run some test cases on the clAmdBlasSgemv and clAmdBlasDgemv routines and compare them to standard cpu blas routines and I am getting a slight difference in the answers. Any ideas why there is a slight error?
Attached is the single precision code I wrote. I used the example program provided in the clAmdBlas sample directory and modified it to run a range of matrix sizes and compare gpu answer to the cpu blas result.
My system has the following specs.
amd-driver-installer-catalyst-13.2-beta6-linux-x86.x86_64
AMD-APP-SDK-v2.8-lnx64
clAmdBlas-1.8.291
Linux nusselt 3.4.34 #4 SMP Fri Mar 1 23:30:31 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
3x AMD Radeon HD 7900 Series
Here are the results of the code. The code assigns a random matrix and vector. I am only comparing the first element of the output vector.
nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv
M N Ygpu(1) Ycpu(1) Percent Error(1)
======================================================================
64 64 1.411686e+01 1.328439e+01 6.266490e+00
71 71 3.106447e+01 2.828872e+01 9.812211e+00
79 79 4.925335e+01 4.468921e+01 1.021307e+01
88 88 6.938486e+01 6.311882e+01 9.927360e+00
97 97 9.144693e+01 8.144869e+01 1.227550e+01
107 107 1.166902e+02 1.032968e+02 1.296601e+01
118 118 1.448911e+02 1.272231e+02 1.388736e+01
130 130 1.764819e+02 1.583401e+02 1.145749e+01
144 144 2.113761e+02 1.929213e+02 9.565993e+00
159 159 2.497652e+02 2.324539e+02 7.447187e+00
176 176 2.925247e+02 2.744401e+02 6.589621e+00
194 194 3.394406e+02 3.219393e+02 5.436186e+00
214 214 3.910286e+02 3.737681e+02 4.617962e+00
236 236 4.464761e+02 4.299068e+02 3.854162e+00
260 260 5.064575e+02 4.916926e+02 3.002871e+00
287 287 5.724889e+02 5.609023e+02 2.065709e+00
316 316 6.502740e+02 6.369617e+02 2.089981e+00
348 348 7.394334e+02 7.256129e+02 1.904665e+00
383 383 8.319628e+02 8.213775e+02 1.288721e+00
422 422 9.353116e+02 9.255268e+02 1.057220e+00
465 465 1.042261e+03 1.041738e+03 5.017624e-02
512 512 1.165649e+03 1.167591e+03 1.663059e-01
564 564 1.301569e+03 1.302999e+03 1.097603e-01
621 621 1.453288e+03 1.458167e+03 3.346000e-01
684 684 1.624285e+03 1.633837e+03 5.845990e-01
753 753 1.812988e+03 1.821587e+03 4.720609e-01
829 829 2.020191e+03 2.025767e+03 2.752382e-01
913 913 2.247140e+03 2.244898e+03 9.987926e-02
1005 1005 2.492113e+03 2.492683e+03 2.285988e-02
1106 1106 2.761114e+03 2.760242e+03 3.159398e-02
1217 1217 3.048123e+03 3.065077e+03 5.531300e-01
1339 1339 3.363256e+03 3.400057e+03 1.082365e+00
1474 1474 3.713925e+03 3.762384e+03 1.288006e+00
1622 1622 4.103960e+03 4.161901e+03 1.392174e+00
1785 1785 4.543989e+03 4.602355e+03 1.268171e+00
1964 1964 5.033531e+03 5.088098e+03 1.072432e+00
2161 2161 5.571005e+03 5.626964e+03 9.944705e-01
2378 2378 6.160100e+03 6.224594e+03 1.036126e+00
2616 2616 6.809108e+03 6.874081e+03 9.451832e-01
2878 2878 7.523969e+03 7.579605e+03 7.340254e-01
3166 3166 8.312679e+03 8.349850e+03 4.451685e-01
3483 3483 9.177878e+03 9.232830e+03 5.951821e-01
3832 3832 1.012808e+04 1.019026e+04 6.101776e-01
4216 4216 1.118314e+04 1.124070e+04 5.120725e-01
4638 4638 1.235017e+04 1.238841e+04 3.086617e-01
5102 5102 1.364414e+04 1.369116e+04 3.434441e-01
5613 5613 1.506476e+04 1.510302e+04 2.533318e-01
6175 6175 1.661373e+04 1.662702e+04 7.991275e-02
6793 6793 1.833668e+04 1.833816e+04 8.083808e-03
7473 7473 2.023177e+04 2.017815e+04 2.657093e-01
8221 8221 2.229929e+04 2.220930e+04 4.052005e-01
9044 9044 2.456117e+04 2.442957e+04 5.386818e-01
9949 9949 2.703839e+04 2.691752e+04 4.490497e-01
10945 10945 2.974846e+04 2.965196e+04 3.254555e-01
12040 12040 3.270365e+04 3.265572e+04 1.467608e-01
13245 13245 3.597929e+04 3.594607e+04 9.242361e-02
Here is a second try:
nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv
M N Ygpu(1) Ycpu(1) Percent Error(1)
======================================================================
64 64 1.411686e+01 1.328439e+01 6.266490e+00
71 71 3.106447e+01 2.828872e+01 9.812211e+00
79 79 4.925335e+01 4.468921e+01 1.021307e+01
88 88 6.938486e+01 6.311882e+01 9.927360e+00
97 97 9.144693e+01 8.144869e+01 1.227550e+01
107 107 1.166902e+02 1.032968e+02 1.296601e+01
118 118 1.448911e+02 1.272231e+02 1.388736e+01
130 130 1.764819e+02 1.583401e+02 1.145749e+01
144 144 2.113761e+02 1.929213e+02 9.565993e+00
159 159 2.497652e+02 2.324539e+02 7.447187e+00
176 176 2.925247e+02 2.744401e+02 6.589621e+00
194 194 3.394406e+02 3.219393e+02 5.436186e+00
214 214 3.910286e+02 3.737681e+02 4.617962e+00
236 236 4.464761e+02 4.299068e+02 3.854162e+00
260 260 5.064575e+02 4.916926e+02 3.002871e+00
287 287 5.724889e+02 5.609023e+02 2.065709e+00
316 316 6.502740e+02 6.369617e+02 2.089981e+00
348 348 7.394334e+02 7.256129e+02 1.904665e+00
383 383 8.319628e+02 8.213775e+02 1.288721e+00
422 422 9.353116e+02 9.255268e+02 1.057220e+00
465 465 1.042261e+03 1.041738e+03 5.017624e-02
512 512 1.165649e+03 1.167591e+03 1.663059e-01
564 564 1.301569e+03 1.302999e+03 1.097603e-01
621 621 1.453288e+03 1.458167e+03 3.346000e-01
684 684 1.624285e+03 1.633837e+03 5.845990e-01
753 753 1.812988e+03 1.821587e+03 4.720609e-01
829 829 2.020191e+03 2.025767e+03 2.752382e-01
913 913 2.247140e+03 2.244898e+03 9.987926e-02
1005 1005 2.492113e+03 2.492683e+03 2.285988e-02
1106 1106 2.761114e+03 2.760242e+03 3.159398e-02
1217 1217 3.048123e+03 3.065077e+03 5.531300e-01
1339 1339 3.363256e+03 3.400057e+03 1.082365e+00
1474 1474 3.713925e+03 3.762384e+03 1.288006e+00
1622 1622 4.103960e+03 4.161901e+03 1.392174e+00
1785 1785 4.543989e+03 4.602355e+03 1.268171e+00
1964 1964 5.033531e+03 5.088098e+03 1.072432e+00
2161 2161 5.571005e+03 5.626964e+03 9.944705e-01
2378 2378 6.160100e+03 6.224594e+03 1.036126e+00
2616 2616 6.809108e+03 6.874081e+03 9.451832e-01
2878 2878 7.523969e+03 7.579605e+03 7.340254e-01
3166 3166 8.312679e+03 8.349850e+03 4.451685e-01
3483 3483 9.177878e+03 9.232830e+03 5.951821e-01
3832 3832 1.012808e+04 1.019026e+04 6.101776e-01
4216 4216 1.118314e+04 1.124070e+04 5.120725e-01
4638 4638 1.235017e+04 1.238841e+04 3.086617e-01
5102 5102 1.364414e+04 1.369116e+04 3.434441e-01
5613 5613 1.506476e+04 1.510302e+04 2.533318e-01
6175 6175 1.661373e+04 1.662702e+04 7.991275e-02
6793 6793 1.833668e+04 1.833816e+04 8.083808e-03
7473 7473 2.023177e+04 2.017815e+04 2.657093e-01
8221 8221 2.229929e+04 2.220930e+04 4.052005e-01
9044 9044 2.456117e+04 2.442957e+04 5.386818e-01
9949 9949 2.703839e+04 2.691752e+04 4.490497e-01
10945 10945 2.974846e+04 2.965196e+04 3.254555e-01
12040 12040 3.270365e+04 3.265572e+04 1.467608e-01
13245 13245 3.597929e+04 3.594607e+04 9.242361e-02
I think I solved my own problem. The blas routines are column major. So, I should set
static const clAmdBlasOrder order = clAmdBlasColumnMajor; //clAmdBlasRowMajor;
nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv
M N Ygpu(1) Ycpu(1) Percent Error(1)
======================================================================
64 64 2.194825e+01 2.194822e+01 1.303533e-04
71 71 3.695258e+01 3.695256e+01 6.193938e-05
79 79 5.335306e+01 5.335305e+01 2.144974e-05
88 88 7.178266e+01 7.178265e+01 2.125693e-05
97 97 9.011253e+01 9.011250e+01 2.539956e-05
107 107 1.119606e+02 1.119606e+02 2.044307e-05
118 118 1.358870e+02 1.358869e+02 2.245806e-05
130 130 1.670040e+02 1.670040e+02 9.136783e-06
144 144 2.015851e+02 2.015851e+02 0.000000e+00
159 159 2.411177e+02 2.411178e+02 6.328356e-06
176 176 2.831040e+02 2.831039e+02 2.155928e-05
194 194 3.306032e+02 3.306031e+02 3.692352e-05
214 214 3.824321e+02 3.824319e+02 6.383898e-05
236 236 4.385708e+02 4.385706e+02 4.870893e-05
260 260 5.003567e+02 5.003564e+02 6.709086e-05
287 287 5.695662e+02 5.695659e+02 5.358041e-05
316 316 6.456257e+02 6.456253e+02 6.617555e-05
348 348 7.342762e+02 7.342766e+02 4.987371e-05
383 383 8.300407e+02 8.300411e+02 5.147288e-05
422 422 9.341904e+02 9.341904e+02 6.533481e-06
465 465 1.050402e+03 1.050402e+03 6.972775e-05
512 512 1.176255e+03 1.176255e+03 1.037788e-05
564 564 1.311663e+03 1.311663e+03 9.306530e-06
621 621 1.466831e+03 1.466832e+03 4.993224e-05
684 684 1.642501e+03 1.642501e+03 7.431980e-06
753 753 1.830251e+03 1.830251e+03 6.669595e-06
829 829 2.034431e+03 2.034431e+03 6.000220e-06
913 913 2.253564e+03 2.253562e+03 1.191690e-04
1005 1005 2.501351e+03 2.501346e+03 1.952074e-04
1106 1106 2.768912e+03 2.768905e+03 2.468823e-04
1217 1217 3.073746e+03 3.073740e+03 1.906269e-04
1339 1339 3.408728e+03 3.408721e+03 2.005426e-04
1474 1474 3.771056e+03 3.771048e+03 2.071705e-04
1622 1622 4.170569e+03 4.170564e+03 1.170780e-04
1785 1785 4.611025e+03 4.611019e+03 1.376628e-04
1964 1964 5.096781e+03 5.096761e+03 3.927893e-04
2161 2161 5.635651e+03 5.635627e+03 4.245451e-04
2378 2378 6.233284e+03 6.233258e+03 4.151746e-04
2616 2616 6.882771e+03 6.882744e+03 3.972798e-04
2878 2878 7.588303e+03 7.588269e+03 4.439933e-04
3166 3166 8.358554e+03 8.358509e+03 5.374389e-04
3483 3483 9.241557e+03 9.241489e+03 7.291337e-04
3832 3832 1.019895e+04 1.019892e+04 2.776794e-04
4216 4216 1.124939e+04 1.124936e+04 2.777935e-04
4638 4638 1.239710e+04 1.239707e+04 1.890567e-04
5102 5102 1.369983e+04 1.369982e+04 7.128287e-05
5613 5613 1.511168e+04 1.511168e+04 1.938691e-05
6175 6175 1.663567e+04 1.663567e+04 2.348116e-05
6793 6793 1.834676e+04 1.834681e+04 3.087219e-04
7473 7473 2.018678e+04 2.018681e+04 1.548041e-04
8221 8221 2.221797e+04 2.221795e+04 8.790752e-05
9044 9044 2.443827e+04 2.443822e+04 2.077944e-04
9949 9949 2.692621e+04 2.692617e+04 1.595799e-04
10945 10945 2.966064e+04 2.966061e+04 1.251133e-04
12040 12040 3.266446e+04 3.266437e+04 2.810306e-04
13245 13245 3.595488e+04 3.595472e+04 4.345743e-04
Good that you found this out...
But still you can expect slight deviations because floating point results depend on the order in which operations are carried out....Since parallel breakdown changes this order, slight deviations are expected. FYI