20 Replies Latest reply on Feb 17, 2011 6:47 AM by genaganna

    IEEE 754 Floating point division discrepancy

    liwoog
      Different results on x86, NVidia and ATI

      1.0f / 96.0f (expression evaluated at runtime as 1.0f / x, with x = 96.0)

      Gives on x86 (westmere) and NVidia:

      1.041666697711e-02 (abs error compared to 1.04166..e-02 is ~3.1e-10)

      On ATI 6970 HD, OpenCL 1.1, SDK 2.3

      1.041666604578e-02 (abs error compared to 1.04166..e-02 is ~6.2e-10)

      So the x86 and Nvidia hardware give the proper answer (the one with the lowest error).

      What can I do to get full accuracy on the 6970 and OpenCL?

        • IEEE 754 Floating point division discrepancy
          MicahVillmow
          The error ranges of division is specified in the OpenCL spec.
          • IEEE 754 Floating point division discrepancy
            moozoo

            If you compare ulps information from the CUDA programming guide with that of the opencl 1.1 spec you will see the CUDA spec has a much lower error specification.

            I'm guessing that CUDA is more targeted at engineering and scientific usage than opencl.

             

              • IEEE 754 Floating point division discrepancy
                nou

                no. OpenCL is targeting much wider variety of devices. so specification must cover this devices capabilities.

                  • IEEE 754 Floating point division discrepancy
                    Meteorhead

                    OpenCL would like to target CPUs, GPUs, (APUs,) mobile phones, calculators and heaps of similar low power devices. The standard specifies absolute minimum precision a device has to achieve.

                    We all know that OpenCL is an API built on top of CAL, and also it is built on top of CUDA. If NV cards are capable of reaching certain precision in CUDA but they do not bring the same under OpenCL, it is almost like an owngoal. If AMD cards have lower precision in division, than AMD has to work on that a little more. But it is not a matter of the API.

                      • IEEE 754 Floating point division discrepancy
                        moozoo

                         

                        Originally posted by: MeteorheadWe all know that OpenCL is an API built on top of CAL, and also it is built on top of CUDA. If NV cards are capable of reaching certain precision in CUDA but they do not bring the same under OpenCL, it is almost like an owngoal. If AMD cards have lower precision in division, than AMD has to work on that a little more. But it is not a matter of the API.


                        No AMD have meet the requirements of the API they don't have to do anymore work on the lower precision. This is what Micah was saying in his short to the point answer.

                        If you program to CUDA then you are guaranteed certain precision and that precision is higher than you are guaranteed with opencl 1.1.

                        It may be that Nvidia or AMD might actually give a higher precision than the minimum under opencl as per Nividia and AMD prior to the 6xxx series. but you can not count on it.

                        The correct way to handle this is exactly what liwoong did. Add one step of the Newton-Raphson method.

                        It's just something you have to be aware of. Don't automatically assume a higher level of precision than the opencl 1.1 specs based on the hardware you just happen to have.

                         

                          • IEEE 754 Floating point division discrepancy
                            golgo_13

                            Moozoo, I think you mean accuracy, not precision.

                            I'm wondering why liwoog is using single precision at all if accuracy is a concern?  Isn't the flushing of subnormal values to zero also a problem?

                            I'd also like to mention that single precision fma only has hardware support on double-capable GPUs.  It takes a lot of work to get fma right in software (try it!).

                              • IEEE 754 Floating point division discrepancy
                                liwoog

                                golgo_13:

                                single precision = 4x the speed & 1/2x the memory

                                • IEEE 754 Floating point division discrepancy
                                  moozoo

                                   

                                  Originally posted by: golgo_13 Moozoo, I think you mean accuracy, not precision.


                                  Yep, sorry I do mean accuracy.

                                  Comparing the double precision ulp information (for all functions) between CUDA and opencl shows this is also true of double precision but of course double precision is much more accurate than single precision.

                                   

                                    • IEEE 754 Floating point division discrepancy
                                      Alexium

                                      I also have a story to tell. I've created very simple basic N-Body simulation program, with absolute minimum of FP instructions. I've tested it on CPU (without OpenCL!), it was all fine. I then ported it to OpenCL in order to test it on my AMD GPU. The kernel code was absolutely the same as corresponding function in initial CPU code. But when I tested it - FP errors blew the simulation to hell. And though my GPU (RV770) supports double precision, it's not avaliable via OpenCL. Interesting thing is the same single precision kernel written and compiled with Brook+ worked fine. That's OpenCL for you...

                                        • IEEE 754 Floating point division discrepancy
                                          genaganna

                                           

                                          Originally posted by: Alexium I also have a story to tell. I've created very simple basic N-Body simulation program, with absolute minimum of FP instructions. I've tested it on CPU (without OpenCL!), it was all fine. I then ported it to OpenCL in order to test it on my AMD GPU. The kernel code was absolutely the same as corresponding function in initial CPU code. But when I tested it - FP errors blew the simulation to hell. And though my GPU (RV770) supports double precision, it's not avaliable via OpenCL. Interesting thing is the same single precision kernel written and compiled with Brook+ worked fine. That's OpenCL for you...

                                           

                                          Double precision is supported on RV770. Only few math functions won't work on RV770.

                                          Could you please post your OpenCL kernel and Brook+ kernel?

                                          What happens when you run on CPU through OpenCL?

                                            • IEEE 754 Floating point division discrepancy
                                              Alexium

                                              1) Sorry, my bad. I had problem with double and I forgot that I solved it. But using double instead of float didn't help (which is strange).
                                              2) I couldn’t get it working on CPU. It was quite some time ago, but AFAIR I was getting runtime error somewhere inside OpenCL function calls. That probably (obviously?) indicates problems with my code, but still – it works OK for GPU, and I couldn’t find any mistakes, so I was lost there and stopped trying to test on CPU.
                                              Loke I said, my code is strange for it shouldn’t fail on CPU. But I don’t think it’s the same error that’s causing precision problems.
                                              Brook+ code is not written by me but by my friend, it’s a bit simpler (I’m doing additional computations). When I encountered problems, I’ve made my kernel to be just like his, but that didn’t help much. I think I’ll try to play with the code to find out what’s going on, but not today, unfortunately.
                                              I realize I didn’t provide strict evidence, but I’m telling you something’s not right.
                                              Below are links to the code:
                                              http://codepaste.ru/5403/

                                              http://codepaste.ru/5402/