10 Replies Latest reply on Jan 14, 2010 10:05 AM by Iska91

    OpenCL inaccurate

    Iska91

      Hey everyone,

      I finally got OpenCL up and running so I started my first tests. I calculated Pi using the taylorserie of arctan(1) (= 1/4 Pi) but I got an error in the number calculated by the GPU while they are using the same code!
      int main:
      [code]
      cout << setprecision(10) << setiosflags(ios::fixed) << setiosflags(ios::showpoint);

      cout << "Calculating Pi using GPU (1024 serie)\n\n";

      // Initialize Host application
      if(initializeHost()==1)
      return 1;

      // Initialize OpenCL resources
      if(initializeCL()==1)
      return 1;

      int iTime = 0;

      iTime = runCLKernels();
      // Run the CL program
      if(iTime==1)
      return 1;

      float fPi = 0.0f;
      for(cl_uint i = 0; i < width; i++)
      {
       fPi += output[ i ];
      }

      cout << "Pi is :" << fPi*4.0f << "\n";
      cout << "Time total :" << iTime << "\n\n\n";
      system("pause");

      // Releases OpenCL resources
      if(cleanupCL()==1)
      return 1;

      // Release host resources
      cleanupHost();

      cout << "Calculating Pi using CPU (1024 serie)\n\n";

      iStart = GetTickCount();

      fPi = 0.0f;
      int iCount = 0;
      int iTotal = 256 * 10000;
      while (iCount < iTotal)
      {
      fPi += pow(-1.0, iCount) / (2 * iCount + 1);

      iCount++;
      }

      iEnd = GetTickCount();

      cout << "Pi is :" << fPi*4.0f << "\n";
      cout << "Time total :" << iEnd - iStart << "\n";

      system("pause");

      return 0;
      [/code]

      Kernel
      [code]
       __kernel void MainKernel(__global float * output)
      {
      uint xid = get_global_id(0);

      uint iCount = xid * 10000;
      float fTemp = 0.0f;
      while (iCount < (xid * 10000 + 10000))
      {
      fTemp += pow((-1), iCount) / (2 * iCount + 1);

      iCount++;
      }
      output[xid] = fTemp;
      }
      [/code]

      Output:
      [code]
      Calculating Pi using GPU (1024 serie)

      Pi is: 3.1415958405
      Time total: 62

       

      Calculating Pi using CPU (1024 serie)

      Pi is: 3.1415963173
      Time total: 406
      [/code]

      Does anyone know what causes this error? It becomes bigger using when I make the serie longer.

       

      Regards,
      Iska



        • OpenCL inaccurate
          hazeman

          FPU on x86 architecture internaly uses 80 bit arithmetic. GPU on the other hand is using all the time 32bits. That is most probably the cause of your problem.

          To improve accuracy use double or more advanced techniques like quad float or double-double ( google QD (C++/Fortran-90 double-double and quad-double package) ).

            • OpenCL inaccurate
              Iska91

              I found that a float has a pretty low accuracy, but OpenCL (on ATI 4850) fails to build when I'm using doubles. Is there a way to fix this?

                • OpenCL inaccurate
                  omkaranathan

                  Double is not supported currently

                    • OpenCL inaccurate
                      Iska91

                      I noticed

                      But that means that I'm screwed and with that a whole lot of scientific applications that require a high accuracy?

                        • OpenCL inaccurate
                          eduardoschardong

                          The problem of GPU and CPU not getting the same results may not be precision (and BTW, CPU may not be using the 80-bit precision intermediary results) but the uggly fact that, when using floats or doubles, multiplys and adds are not comutatives, I mean, a + b + c may differ from c + b + a, when comparing results the algorithm is a bit different, in CPU you sum from first to last, in GPU you sum in blocks to then sum the results of the blocks, this little difference when done with floats is enough to yield different results.

                           

                          Also, if you are trying to compute a lot of pi digits double precision will not be enough either and you will probably not like Taylor series anymore

                           

                  • OpenCL inaccurate
                    MicahVillmow
                    if you specify the cl_khr_fp64 extension, you will get experimental support for the basic math operations for doubles along with I/O.
                      • OpenCL inaccurate
                        Iska91

                        Thank you for your replay MicahVillmow,

                        My code is like this ATM (kernel):

                        #pragma OPENCL EXTENSION cl_khr_fp64 : enable

                        __kernel void MainKernel(__global double * output)
                        {
                           uint xid = get_global_id(0);

                           uint iCount = xid * 10000;
                           double dTemp = 0.0;
                           while (iCount < (xid * 10000 + 10000))
                           {
                             dTemp += pow((-1.0), iCount) / (2 * iCount + 1);

                             iCount++;
                           }
                           output[xid] = dTemp;
                        }

                        But i got the following error:
                        C:\Users\Jasper\AppData\Local\Temp\OCL7D6A.tml.obj:fake: (.text+0xa4): undefined reference to '__pow_f64' 
                        C:\Users\Jasper\AppData\Local\Temp\OCL7D6A.tml.obj:fake: (.text+0x184): undefined reference to '__pow_f64' 
                        Error: Building Program (clBuildProgram)

                        Do you know what causes this error?

                        Thanks in advance