3 Replies Latest reply on Jul 9, 2008 2:06 PM by bayoumi

    double precision FP again

      I have seen several threads regarding issues with doubles (including my own), and I started to be confused. The summary is:
      1- Transcendentals such as exp, log, pow, sin, ... etc are still float then type casted
      2- One topic talked about brcc in general truncating doubles from both kernels and C++, and a workaround proposed
      3- In my own tests, I have seen that basic add/sub/mult/div did not give any precision errors using brcc, compared to regular 64b CPUs. This is different from what has been reported in # 2 above
      4- I never got brcc with brt mode =CPU to work when I use doubles as constants or even when I typecast float constants to doubles. It could be me ...

      Can someone summarize the whole issue?

      1- Is this an SDK issue (i.e. will be fixed anytime soon) or a hardware issue with the ASIC architecture (i.e. no solution soon)?
      2- Is there any workarounds or new releases which can get us "real" FP on all kernels (trans. & basic maths)?

      I have seen in the posts people doing matrix factorizations (including myself), and those tend to be very sensitive to precision for large scale problems


        • double precision FP again

          Hi Amr,

          Not from AMD, but here is my perception about the issues you mentioned :

          1- Yes, there is currently no implementation of double precision transcendentals in hardware nor in software. Hopefully AMD will release a software implementation soon...

          2- The topic you mentioned was about floating-point constants in the code. That is, if you write

          double a = 1.23456789012345;

          it will be truncated to something like

          double a = 1.2345679f;

          Passing double variables using function parameters, stream or constant memory should work correctly.

          3- From my own tests on a 3850, it appears that the double precision MAD is actually fused (computes a*b+c exactly, then rounds at the end), though I have not seen it mentioned anywhere. So you might end up with a result more precise than on the CPU if you use it.

          DP division is implemented in software using the fused MAD (FMA), but does not seem to provide correct rounding in every case, so it may yield different results than the CPU.


            • double precision FP again

              Hi there,

              I did ask about rounding modes a little while ago in this post.

              The suggestion there was that mads weren't fused but that the multiplication wasn't truncated either (i.e. a mad is just the same as a rounded mul followed by a rounded add). I tested this just recently for single precision and this was indeed what happened, but I didn't try double precision. If it is fused then that'd be great!

            • double precision FP again
              thank you everyone for your replies.
              I hope also someone from AMD would clear these issues you have raised, specially how does the actual hardware does both transcendental and basic DP maths.
              BTW, does the 4870 exepct to have a change in the DP FP math hardware?