3 Replies Latest reply on Sep 9, 2011 12:16 PM by dimkar

    BUG: Double runs as single?

    Atmapuri

      Hi!

      I have a kernel like this:

      __kernel void  vzPowx_dc(__global const double *aSrc, const int aSrcIdx,
                                    const double bRe, const double bIm,
                            __global double *Dst, const int DstIdx,
                                                  const int Length)
      {
              dobule Re, Im, XRe, XIm, ExpX;
              size_t i = get_global_id(0);

              XRe = aSrc[aSrcIdx + i];


                    Re = log(fabs(XRe));
                    Im = atan2(0, XRe);

                     XRe = Re * bRe - Im * bIm;
                     XIm = Im * bRe + Re * bIm;

                     ExpX = exp(XRe);
                     Im = sincos(XIm, &Re);

                     Re *= ExpX;
                     Im *= ExpX;

                    Dst[2*(DstIdx + i)] = Re;

                    Dst[2*(DstIdx + i)+1] = Im;
      }

      The code running via AMD Open CL using CPU as Target runs with only 4 numerals of precision (single precision typical) instead of 8.

      Thanks!
      Atmapuri

        • BUG: Double runs as single?
          dimkar

          Have you enables khr_fp64 or amd_fp64 extensions?

          Try with amd_fp64 on the CPU

          #pragma OPENCL EXTENSION cl_amd_fp64: enable

            • BUG: Double runs as single?
              Atmapuri

              Hi!

              Yes, I have this enabled, otherwise the kernel wont even compile. I tried both with Intel and AMD driver and the precision remains equal (very low). I also verified that the results are correct (with the precision mentioned) and that this is in fact what I am computing. When running C++ on CPU I get 7 valid numerals.

              Thanks!
              Atmapuri

                • BUG: Double runs as single?
                  dimkar

                  How do you evaluate your result?

                  Against the output of a C++ function?

                  Try might help identify where the problem is exactly:

                  add cl_amd_printf: enable

                  printf all the intermediate results of a couple of threads.

                  printf the equivelant positions in the loop in the reference program