14 Replies Latest reply on Sep 27, 2009 11:10 PM by Gipsel

    Full Precision on Transcendental Math

    bayoumi
      Hi,
      I tried a program that required using math operations such as exp, pow, log, ..etc.
      They seem to work only with float types.
      The brcc accepts double input & output streams for such operations, but during run time a get erroneous results?
      Thanks
      Amr
        • Full Precision on Transcendental Math
          michael.chu
          Hi Amr,

          Currently, the transcendental functions such as sin/cos are handled natively by the hardware. The Radeon 3870 and FireStream 9170 have native single-precision transcendentals. What Brook+ is doing is casting doubles down to floats for the transcendentals and then casting them back to doubles. There may be some problem with that path and we are having the team take a look.

          In a future release, we'll be introducing an emulation path for double precision transcendentals.

          Michael.
          • Full Precision on Transcendental Math
            bayoumi
            Hi Michael,
            I tried to use something like:

            y = exp((float) (x)); where y<>, x<> are double out/in 1D streams, and y= (double)(exp(x)) or y = (double)(exp((float)x);
            x was in the range of -1.0 to 1.0
            none of thes worked. I tried sqrt(abs(x)), pow(x,a), log(abs(x)). I believe the rule applies to all.
            I also noticed that there was no log10 function. There are some apps in EE which uses it
            Thanks
            Amr
              • Full Precision on Transcendental Math
                michael.chu
                Hi Amr,

                Can you try using a separate variable to convert from double to float, do the operation on a float to a float, and then use a separate variable to convert from a float back to a double? I'm just trying to isolate this and see if Brook+ is just misbehaving when doing the casting all at once.

                i.e.

                double a;
                float a_float;
                float b_float;
                double b;

                a_float = (float)a;
                b_float = exp(a_float);
                b = (double)b_float;


                Thanks!

                Michael.
              • Full Precision on Transcendental Math
                bayoumi
                Dear Michael
                I tried several tests:
                1- The test you requested works fine with the exp() function using intermediate variables, and I assign the result of b after type casting to an "out double b_out<>" without problems.

                2- To test type casting on streams, I used an input stream "float a_in<>" and I assign it use to type casting to double :
                kernel void (float a_in<>, out double b_out<>){
                b_out = (double)a_in;
                }
                This works fine as well

                3- I use
                kernel void (double a<>, out double b_out<>){
                b_out = exp((float)a);
                /* or b_out = (double)exp((float)a);*/
                }

                and this does NOT work.
                It seems there is a problem with the math function type casting(probably function prototype or so), and NOT with the math function itself and NOT with type casting on streams

                • Full Precision on Transcendental Math
                  bayoumi
                  To test type casting from double to float, I also tried :

                  kernel void (double a<>, out double b_out<>){
                  b_out = (double)((float)a);
                  }

                  and this works fine as well
                  Amr
                  • Full Precision on Transcendental Math
                    bayoumi
                    Hi Michael,
                    I was able to more tracking for the problem. It is the type casting from the exp() fun output, not input.
                    If you use intermediate "dummy" float streams for the output, then you do a separate type casting, everything works.

                    kernel void (double a_double<>, float tmp_float<>, float out double b_double_out<>){
                    tmp_float = exp((float)a_double);
                    tmp_float = exp((float)a_double);
                    }

                    everything work OK.
                    If you try to use:
                    tmp_float = exp(a_double);
                    or
                    b_double_out = exp((float)a_double);
                    or
                    b_double_out = (double) exp((float)a_double);

                    Then NOTHING works.
                    So for now:
                    the use of a dummy float stream + type casting the input stream at the function input from double to float
                    is the only turnaround I was able to find
                    Thanks
                    Amr
                      • Full Precision on Transcendental Math
                        lpw

                        Another alternative is to brew your own double precision transcendentals. If you really need the precision, then this seems like the only option at the moment. I was able to port the log2(double) function from the Cephes library to IL with good results. Of course, integrating this with Brook+ is another matter altogether. That's one of the reasons I moved to CAL.

                        Lukasz.

                          • Full Precision on Transcendental Math
                            michael.chu
                            Hi Lukasz,

                            I have let the engineering team know that we need double precision transcendental functions for Brook+. I'm trying to get that project into the schedule.

                            Michael.
                              • Full Precision on Transcendental Math
                                NurEinMensch

                                Hello Michael,

                                how far is the work on the double precision transcendentals ?

                                They are really important for scientific problems.

                                Having only mult for double is helpful but not the aim of stream.

                                Scientific problems need double -> hardware can ..

                                But now software can not use exp(), log(), sin(), cos() etc ..!

                                When we can ? :-)

                                ps: hoply with sdk2..

                                Best wishes

                                Marek

                                  • Full Precision on Transcendental Math
                                    riza.guntur

                                    Yes I'm waiting for that too.

                                    And if possible more feature for brook+ (although that won't be included in sdk2, sigh)

                                      • Full Precision on Transcendental Math
                                        NurEinMensch

                                        Any news from that side ??

                                          • Full Precision on Transcendental Math
                                            NurEinMensch

                                            Hi,

                                            after i read the release notes to SDK 2.0 beta3 i don't expect any

                                            double exp(),sin() functionality for GPU in 2009...

                                            In the FAQ is the information that double precision is optional in OpenCL (i think AMD/ATI forced it, as OpenCL partner because it seems that there is a bigger problem to realize this option ...).

                                            So my hope died that this will be possible in the near future. Maybe there have to be a hardware fix (i guess which even is not done in RV8xx)..

                                            Thats really sad, GPU power have such potential for scientific tasks, but we can not optimize our programs because there is no support till now ...

                                            Best wishes

                                            Marek

                                              • Full Precision on Transcendental Math
                                                NurEinMensch

                                                Hello,

                                                because there are no comments about future support of i.e. DP exp() i googled a little bit to find hints where the problem could be.

                                                I found something about HPC 64-bit exponential function implementation in FPGA (especially for scientific demands):

                                                http://www.springerlink.com/content/n553027524j05066/

                                                and other articles about lookup tables.

                                                Now i guess that no gpu (up to rv8xx) contain a DP lookup table for exp() sin() etc ...

                                                So only expensive softwareside solution can be used on current gpus with probable worse performance ..

                                                I would be very happy about any reply of an ATI Developer to that topic ..

                                                Best wishes

                                                Marek

                                                  • Full Precision on Transcendental Math
                                                    Gipsel

                                                     

                                                    Originally posted by: NurEinMensch Hello,

                                                    because there are no comments about future support of i.e. DP exp() i googled a little bit to find hints where the problem could be.

                                                    I found something about HPC 64-bit exponential function implementation in FPGA (especially for scientific demands):

                                                    http://www.springerlink.com/content/n553027524j05066/

                                                    and other articles about lookup tables.

                                                    Now i guess that no gpu (up to rv8xx) contain a DP lookup table for exp() sin() etc ...

                                                    So only expensive softwareside solution can be used on current gpus with probable worse performance ..

                                                    I would be very happy about any reply of an ATI Developer to that topic ..

                                                    Best wishes

                                                    Marek

                                                    I'm not an ATI developer, but at least I've implemented my own exp() in IL for my code. It works more or less the same as it is done on a CPU. Those lookup tables are also not stored in an onchip ROM or so, they are simply provided by the software. exp() is quite expensive on CPUs, too.

                                                    Furthermore, one doesn't need the lookup tables at all (but one could use the constant buffer for it, if one wants). There are other implementations using the quotient of two power series of quite low order (3, if I remember it right). So one just needs an argument reduction, a few constants for the power series, the division (which is also done in software btw.) and the ldexp instruction (which the GPU hardware is capable of). There are several different implementations out there using this scheme and it works also on GPUs. Maybe it is not the fastest possible algorithm, but it isn't that slow either if you compare it with the CPU.