4 Replies Latest reply on Jan 29, 2010 4:48 PM by hazeman

    Double Precision Exp Function HD5870

    Pheelios

      I am a postgrad student doing quantum field theory monte carlo simulations and I have recently acquired an HD5870 card not realizing it didn't have full double precision math support yet.  The main function I need from the library is "exp".

      In principle I thought one could write one's own code for "exp" out of the basic arithmetic operators (that have double support).  The thing is I did and it gave the required result (running on CPU) but mine was orders of magnitude slower than the math library function.

      Is there anyway one can write a full speed exponential function prior to full double precision support, possibly using CAL if it wouldn't require a massive undertaking.

        • Double Precision Exp Function HD5870
          hazeman

          First of all there is no native double exp on rv8xx. So it must be implemented using basic operations.

          In theory rv8xx has fused multiply add - but ISA documentation doesn't say if it's IEEE compliant.

          This is required for first version ( faster ) of exp to work correctly.With true fma error is <1 ulp, without it >50 ulp.

          Also I don't now if double fma and ldexp instructions are available now in OpenCL ( ldexp is native on radeons ).

          This is probably the fastest exp implementation possible on 5xxx family (~10 mads). Also 1 mad can be removed for <2 ulp precision.

          PS. This code is written in C++ to IL converter so thats why there is double1 . Also all constants like "(long double)1/(long double)std::log((long double)2)" should be precomputed and values inserted into OpenCL code.

          PS2. If someone from ATI is reading this post then please correct ldexp description in IL docs - because someone who wrote it clearly didn't know how it works.

          double1 exp( double1 x ) { double1 cinv_log2((long double)1/(long double)std::log((long double)2)), cinv_k(1./32.); double1 s,r,m; m = round( cinv_log2*x ); r = cinv_k*fma(-log((long double)2),m,x); // this part must be done with extended precision // taylor seriers expansion s = double1((long double)1/(long double)5040); s = mad(s,r,double1((long double)1/(long double)720)); s = mad(s,r,double1((long double)1/(long double)120)); s = mad(s,r,double1((long double)1/(long double)24)); s = mad(s,r,double1((long double)1/(long double)6)); s = mad(s,r,double1((long double)1/(long double)2)); s = mad(s,r,double1((long double)1/(long double)1)); s = s*r; // s = (1+s)^32 s = mad(s,s,s+s); // this mad can be removed for <2 ulp -> cinv_k=1/16 s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = s + double1(1); return ldexp(s,convert_int1(m)); }

          • Double Precision Exp Function HD5870
            hazeman

            Now the worse part ( and longer ). Second version of exp when there is no true fused mad. This one works also on 4xxx family. The error is ~1ulp ( less <1.5 ulp in my tests ).

            For most sensitive part "x - log((long double)2)*m" double-double technique is used to extend precision. Also I would like to point out that this is based on code from qd library ( http://crd.lbl.gov/~dhbailey/mpdist/ ). The only algorithmic modification made by me is much faster ( and probably more "accurate" ) split function.

            Again the code is written in C++ to IL converter/compiler, but it is very similiar to OpenCL ( ok templates aren't ).

            PS. If the ldexp isn't available atm in OpenCL it can be raplaced by 5 operations ( &,<<,+,&,| ).

             

            double1 exp( double1 x ) { double1 cinv_log2((long double)1/(long double)std::log((long double)2)), cmln20(-6.931471805599452862e-01), cmln21(-2.319046813846299558e-17), cinv_k(1./32.); double1 s,r,m,p0,p1,r0,r1; m = round( cinv_log2*x ); //r = cinv_k*(x - log(2)*m); // cmln20,cmln21 <- this is -log(2) in double-double representation dd_prod_dbl( p0, p1, cmln20, cmln21, m ); // double-double * double dd_sum_dbl( r0, r1, p0, p1, x ); // double-double + double r = cinv_k*r0; // taylor seriers expansion s = double1((long double)1/(long double)5040); s = mad(s,r,double1((long double)1/(long double)720)); s = mad(s,r,double1((long double)1/(long double)120)); s = mad(s,r,double1((long double)1/(long double)24)); s = mad(s,r,double1((long double)1/(long double)6)); s = mad(s,r,double1((long double)1/(long double)2)); s = mad(s,r,double1((long double)1/(long double)1)); s = s*r; // s = (1+s)^32 s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = s + double1(1); return ldexp(s,convert_int1(m)); } // // double-double with double // operations // namespace dd { void split( const double1& a, double1& hi, double1& lo ) { // in OpenCL it should be // hi = as_double(as_uint2(a) & uint2(0xFC000000,0xFFFFFFFF)); hi = a & uint2(0xFC000000,0xFFFFFFFF); lo = a - hi; } void split( const double2& a, double2& hi, double2& lo ) { hi = a & uint4(0xFC000000,0xFFFFFFFF,0xFC000000,0xFFFFFFFF); lo = a - hi; } template<typename D> variable<D> two_prod( const variable<D>& a, const variable<D>& b, variable<D>& err) { variable<D> a_hi, a_lo, b_hi, b_lo,p; p = a*b; split(a, a_hi, a_lo); split(b, b_hi, b_lo); //err = ((a_hi * b_hi - p) + a_hi * b_lo + a_lo * b_hi) + a_lo * b_lo; err = mad( a_lo,b_lo, mad(a_hi,b_hi,-p) + mad(a_hi,b_lo,a_lo*b_hi) ); return p; } template<typename D> variable<D> two_sum( const variable<D>& a, const variable<D>& b, variable<D>& err) { variable<D> s,bb; s = a + b; bb = s - a; err = (a - (s - bb)) + (b - bb); return s; } template<typename D> variable<D> quick_two_sum( const variable<D>& a, const variable<D>& b, variable<D>& err) { variable<D> s; s = a + b; err = b - (s - a); return s; } } // dd template<typename D> void dd_prod_dbl( variable<D>& r0, variable<D>& r1, const variable<D>& a0, const variable<D>& a1, const variable<D>& b ) { variable<D> p1,p2; p1 = dd::two_prod( a0, b, p2 ); p2 = p2 + a1*b; p1 = dd::quick_two_sum( p1, p2, p2 ); r0 = p1; r1 = p2; } template<typename D> void dd_sum_dbl( variable<D>& r0, variable<D>& r1, const variable<D>& a0, const variable<D>& a1, const variable<D>& b ) { variable<D> s1,s2; s1 = dd::two_sum( a0, b, s2 ); s2 = s2 + a1; s1 = dd::quick_two_sum( s1, s2, s2 ); r0 = s1; r1 = s2; }

            • Double Precision Exp Function HD5870
              MicahVillmow
              hazeman,
              Can you point me to what you believe is incorrect? I'll poke at the correct person to get it fixed.
                • Double Precision Exp Function HD5870
                  hazeman

                  Here is the ldexp description from IL docs.

                  > Puts an exp and a mantissa into a double.

                  This doesn't make sense ( it's probably sentence taken out of context from someone's explanation of ldexp algorithm )

                  > The computation is result = source1 * 2 source

                  should be dst.xy = (2^src1.x)*src0.xy

                  > Abs and negate modifiers can be used. Output.yx or output.wz are set to the doubeither input produces a NaN.

                  > Overflow produces inf. Underflow produces 0.

                  > Src0 is a double (-1.0d < src0.xy < 1.0d).

                  src0 doesn't have to be in this range !

                  > Src1.x is an integer from -1024 to +1024.

                  > Since the second source src1 is an integer exponent, only the neg modifier is ...

                  I think this is also incorrect. ( if I remember correctly neg doesn't work on ints )

                  To tell you the truth after reading the text I really wasn't sure if this is the instruction I need. Name was correct but description wasn't.  I had to make some tests to verify what I'm dealing with .

                  When we are on the subject of IL docs. Please force someone to read the whole text - it has so many spelling errors, missing letters, small logical errors.