2 Replies Latest reply on Mar 5, 2013 6:48 AM by himanshu.gautam

    clAmdBlasSgemv Error Compared to CPU Results

    samthesaab

      I am trying to run some test cases on the clAmdBlasSgemv and clAmdBlasDgemv routines and compare them to standard cpu blas routines and I am getting a slight difference in the answers. Any ideas why there is a slight error?

       

      Attached is the single precision code I wrote. I used the example program provided in the clAmdBlas sample directory and modified it to run a range of matrix sizes and compare gpu answer to the cpu blas result.

       

      My system has the following specs.

      amd-driver-installer-catalyst-13.2-beta6-linux-x86.x86_64

      AMD-APP-SDK-v2.8-lnx64

      clAmdBlas-1.8.291

      Linux nusselt 3.4.34 #4 SMP Fri Mar 1 23:30:31 EST 2013 x86_64 x86_64 x86_64 GNU/Linux

      3x AMD Radeon HD 7900 Series  

       

      Here are the results of the code. The code assigns a random matrix and vector. I am only comparing the first element of the output vector.

      nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

       

        M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

      ======================================================================

         64    64   1.411686e+01       1.328439e+01      6.266490e+00

         71    71   3.106447e+01       2.828872e+01      9.812211e+00

         79    79   4.925335e+01       4.468921e+01      1.021307e+01

         88    88   6.938486e+01       6.311882e+01      9.927360e+00

         97    97   9.144693e+01       8.144869e+01      1.227550e+01

        107   107   1.166902e+02       1.032968e+02      1.296601e+01

        118   118   1.448911e+02       1.272231e+02      1.388736e+01

        130   130   1.764819e+02       1.583401e+02      1.145749e+01

        144   144   2.113761e+02       1.929213e+02      9.565993e+00

        159   159   2.497652e+02       2.324539e+02      7.447187e+00

        176   176   2.925247e+02       2.744401e+02      6.589621e+00

        194   194   3.394406e+02       3.219393e+02      5.436186e+00

        214   214   3.910286e+02       3.737681e+02      4.617962e+00

        236   236   4.464761e+02       4.299068e+02      3.854162e+00

        260   260   5.064575e+02       4.916926e+02      3.002871e+00

        287   287   5.724889e+02       5.609023e+02      2.065709e+00

        316   316   6.502740e+02       6.369617e+02      2.089981e+00

        348   348   7.394334e+02       7.256129e+02      1.904665e+00

        383   383   8.319628e+02       8.213775e+02      1.288721e+00

        422   422   9.353116e+02       9.255268e+02      1.057220e+00

        465   465   1.042261e+03       1.041738e+03      5.017624e-02

        512   512   1.165649e+03       1.167591e+03      1.663059e-01

        564   564   1.301569e+03       1.302999e+03      1.097603e-01

        621   621   1.453288e+03       1.458167e+03      3.346000e-01

        684   684   1.624285e+03       1.633837e+03      5.845990e-01

        753   753   1.812988e+03       1.821587e+03      4.720609e-01

        829   829   2.020191e+03       2.025767e+03      2.752382e-01

        913   913   2.247140e+03       2.244898e+03      9.987926e-02

      1005  1005   2.492113e+03       2.492683e+03      2.285988e-02

      1106  1106   2.761114e+03       2.760242e+03      3.159398e-02

      1217  1217   3.048123e+03       3.065077e+03      5.531300e-01

      1339  1339   3.363256e+03       3.400057e+03      1.082365e+00

      1474  1474   3.713925e+03       3.762384e+03      1.288006e+00

      1622  1622   4.103960e+03       4.161901e+03      1.392174e+00

      1785  1785   4.543989e+03       4.602355e+03      1.268171e+00

      1964  1964   5.033531e+03       5.088098e+03      1.072432e+00

      2161  2161   5.571005e+03       5.626964e+03      9.944705e-01

      2378  2378   6.160100e+03       6.224594e+03      1.036126e+00

      2616  2616   6.809108e+03       6.874081e+03      9.451832e-01

      2878  2878   7.523969e+03       7.579605e+03      7.340254e-01

      3166  3166   8.312679e+03       8.349850e+03      4.451685e-01

      3483  3483   9.177878e+03       9.232830e+03      5.951821e-01

      3832  3832   1.012808e+04       1.019026e+04      6.101776e-01

      4216  4216   1.118314e+04       1.124070e+04      5.120725e-01

      4638  4638   1.235017e+04       1.238841e+04      3.086617e-01

      5102  5102   1.364414e+04       1.369116e+04      3.434441e-01

      5613  5613   1.506476e+04       1.510302e+04      2.533318e-01

      6175  6175   1.661373e+04       1.662702e+04      7.991275e-02

      6793  6793   1.833668e+04       1.833816e+04      8.083808e-03

      7473  7473   2.023177e+04       2.017815e+04      2.657093e-01

      8221  8221   2.229929e+04       2.220930e+04      4.052005e-01

      9044  9044   2.456117e+04       2.442957e+04      5.386818e-01

      9949  9949   2.703839e+04       2.691752e+04      4.490497e-01

      10945 10945   2.974846e+04       2.965196e+04      3.254555e-01

      12040 12040   3.270365e+04       3.265572e+04      1.467608e-01

      13245 13245   3.597929e+04       3.594607e+04      9.242361e-02

       

      Here is a second try:

      nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

       

        M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

      ======================================================================

         64    64   1.411686e+01       1.328439e+01      6.266490e+00

         71    71   3.106447e+01       2.828872e+01      9.812211e+00

         79    79   4.925335e+01       4.468921e+01      1.021307e+01

         88    88   6.938486e+01       6.311882e+01      9.927360e+00

         97    97   9.144693e+01       8.144869e+01      1.227550e+01

        107   107   1.166902e+02       1.032968e+02      1.296601e+01

        118   118   1.448911e+02       1.272231e+02      1.388736e+01

        130   130   1.764819e+02       1.583401e+02      1.145749e+01

        144   144   2.113761e+02       1.929213e+02      9.565993e+00

        159   159   2.497652e+02       2.324539e+02      7.447187e+00

        176   176   2.925247e+02       2.744401e+02      6.589621e+00

        194   194   3.394406e+02       3.219393e+02      5.436186e+00

        214   214   3.910286e+02       3.737681e+02      4.617962e+00

        236   236   4.464761e+02       4.299068e+02      3.854162e+00

        260   260   5.064575e+02       4.916926e+02      3.002871e+00

        287   287   5.724889e+02       5.609023e+02      2.065709e+00

        316   316   6.502740e+02       6.369617e+02      2.089981e+00

        348   348   7.394334e+02       7.256129e+02      1.904665e+00

        383   383   8.319628e+02       8.213775e+02      1.288721e+00

        422   422   9.353116e+02       9.255268e+02      1.057220e+00

        465   465   1.042261e+03       1.041738e+03      5.017624e-02

        512   512   1.165649e+03       1.167591e+03      1.663059e-01

        564   564   1.301569e+03       1.302999e+03      1.097603e-01

        621   621   1.453288e+03       1.458167e+03      3.346000e-01

        684   684   1.624285e+03       1.633837e+03      5.845990e-01

        753   753   1.812988e+03       1.821587e+03      4.720609e-01

        829   829   2.020191e+03       2.025767e+03      2.752382e-01

        913   913   2.247140e+03       2.244898e+03      9.987926e-02

      1005  1005   2.492113e+03       2.492683e+03      2.285988e-02

      1106  1106   2.761114e+03       2.760242e+03      3.159398e-02

      1217  1217   3.048123e+03       3.065077e+03      5.531300e-01

      1339  1339   3.363256e+03       3.400057e+03      1.082365e+00

      1474  1474   3.713925e+03       3.762384e+03      1.288006e+00

      1622  1622   4.103960e+03       4.161901e+03      1.392174e+00

      1785  1785   4.543989e+03       4.602355e+03      1.268171e+00

      1964  1964   5.033531e+03       5.088098e+03      1.072432e+00

      2161  2161   5.571005e+03       5.626964e+03      9.944705e-01

      2378  2378   6.160100e+03       6.224594e+03      1.036126e+00

      2616  2616   6.809108e+03       6.874081e+03      9.451832e-01

      2878  2878   7.523969e+03       7.579605e+03      7.340254e-01

      3166  3166   8.312679e+03       8.349850e+03      4.451685e-01

      3483  3483   9.177878e+03       9.232830e+03      5.951821e-01

      3832  3832   1.012808e+04       1.019026e+04      6.101776e-01

      4216  4216   1.118314e+04       1.124070e+04      5.120725e-01

      4638  4638   1.235017e+04       1.238841e+04      3.086617e-01

      5102  5102   1.364414e+04       1.369116e+04      3.434441e-01

      5613  5613   1.506476e+04       1.510302e+04      2.533318e-01

      6175  6175   1.661373e+04       1.662702e+04      7.991275e-02

      6793  6793   1.833668e+04       1.833816e+04      8.083808e-03

      7473  7473   2.023177e+04       2.017815e+04      2.657093e-01

      8221  8221   2.229929e+04       2.220930e+04      4.052005e-01

      9044  9044   2.456117e+04       2.442957e+04      5.386818e-01

      9949  9949   2.703839e+04       2.691752e+04      4.490497e-01

      10945 10945   2.974846e+04       2.965196e+04      3.254555e-01

      12040 12040   3.270365e+04       3.265572e+04      1.467608e-01

      13245 13245   3.597929e+04       3.594607e+04      9.242361e-02

        • Re: clAmdBlasSgemv Error Compared to CPU Results
          samthesaab

          I think I solved my own problem. The blas routines are column major. So, I should set

          static const clAmdBlasOrder order = clAmdBlasColumnMajor; //clAmdBlasRowMajor;

           

          nusselt@nusselt:~/Codes/clsgemv$ ./example_sgemv

           

            M     N    Ygpu(1)       Ycpu(1)     Percent Error(1)

          ======================================================================

             64    64   2.194825e+01       2.194822e+01      1.303533e-04

             71    71   3.695258e+01       3.695256e+01      6.193938e-05

             79    79   5.335306e+01       5.335305e+01      2.144974e-05

             88    88   7.178266e+01       7.178265e+01      2.125693e-05

             97    97   9.011253e+01       9.011250e+01      2.539956e-05

            107   107   1.119606e+02       1.119606e+02      2.044307e-05

            118   118   1.358870e+02       1.358869e+02      2.245806e-05

            130   130   1.670040e+02       1.670040e+02      9.136783e-06

            144   144   2.015851e+02       2.015851e+02      0.000000e+00

            159   159   2.411177e+02       2.411178e+02      6.328356e-06

            176   176   2.831040e+02       2.831039e+02      2.155928e-05

            194   194   3.306032e+02       3.306031e+02      3.692352e-05

            214   214   3.824321e+02       3.824319e+02      6.383898e-05

            236   236   4.385708e+02       4.385706e+02      4.870893e-05

            260   260   5.003567e+02       5.003564e+02      6.709086e-05

            287   287   5.695662e+02       5.695659e+02      5.358041e-05

            316   316   6.456257e+02       6.456253e+02      6.617555e-05

            348   348   7.342762e+02       7.342766e+02      4.987371e-05

            383   383   8.300407e+02       8.300411e+02      5.147288e-05

            422   422   9.341904e+02       9.341904e+02      6.533481e-06

            465   465   1.050402e+03       1.050402e+03      6.972775e-05

            512   512   1.176255e+03       1.176255e+03      1.037788e-05

            564   564   1.311663e+03       1.311663e+03      9.306530e-06

            621   621   1.466831e+03       1.466832e+03      4.993224e-05

            684   684   1.642501e+03       1.642501e+03      7.431980e-06

            753   753   1.830251e+03       1.830251e+03      6.669595e-06

            829   829   2.034431e+03       2.034431e+03      6.000220e-06

            913   913   2.253564e+03       2.253562e+03      1.191690e-04

          1005  1005   2.501351e+03       2.501346e+03      1.952074e-04

          1106  1106   2.768912e+03       2.768905e+03      2.468823e-04

          1217  1217   3.073746e+03       3.073740e+03      1.906269e-04

          1339  1339   3.408728e+03       3.408721e+03      2.005426e-04

          1474  1474   3.771056e+03       3.771048e+03      2.071705e-04

          1622  1622   4.170569e+03       4.170564e+03      1.170780e-04

          1785  1785   4.611025e+03       4.611019e+03      1.376628e-04

          1964  1964   5.096781e+03       5.096761e+03      3.927893e-04

          2161  2161   5.635651e+03       5.635627e+03      4.245451e-04

          2378  2378   6.233284e+03       6.233258e+03      4.151746e-04

          2616  2616   6.882771e+03       6.882744e+03      3.972798e-04

          2878  2878   7.588303e+03       7.588269e+03      4.439933e-04

          3166  3166   8.358554e+03       8.358509e+03      5.374389e-04

          3483  3483   9.241557e+03       9.241489e+03      7.291337e-04

          3832  3832   1.019895e+04       1.019892e+04      2.776794e-04

          4216  4216   1.124939e+04       1.124936e+04      2.777935e-04

          4638  4638   1.239710e+04       1.239707e+04      1.890567e-04

          5102  5102   1.369983e+04       1.369982e+04      7.128287e-05

          5613  5613   1.511168e+04       1.511168e+04      1.938691e-05

          6175  6175   1.663567e+04       1.663567e+04      2.348116e-05

          6793  6793   1.834676e+04       1.834681e+04      3.087219e-04

          7473  7473   2.018678e+04       2.018681e+04      1.548041e-04

          8221  8221   2.221797e+04       2.221795e+04      8.790752e-05

          9044  9044   2.443827e+04       2.443822e+04      2.077944e-04

          9949  9949   2.692621e+04       2.692617e+04      1.595799e-04

          10945 10945   2.966064e+04       2.966061e+04      1.251133e-04

          12040 12040   3.266446e+04       3.266437e+04      2.810306e-04

          13245 13245   3.595488e+04       3.595472e+04      4.345743e-04