
Double Precision Exp Function HD5870
hazeman Jan 29, 2010 6:32 AM (in response to Pheelios)First of all there is no native double exp on rv8xx. So it must be implemented using basic operations.
In theory rv8xx has fused multiply add  but ISA documentation doesn't say if it's IEEE compliant.
This is required for first version ( faster ) of exp to work correctly.With true fma error is <1 ulp, without it >50 ulp.
Also I don't now if double fma and ldexp instructions are available now in OpenCL ( ldexp is native on radeons ).
This is probably the fastest exp implementation possible on 5xxx family (~10 mads). Also 1 mad can be removed for <2 ulp precision.
PS. This code is written in C++ to IL converter so thats why there is double1 . Also all constants like "(long double)1/(long double)std::log((long double)2)" should be precomputed and values inserted into OpenCL code.
PS2. If someone from ATI is reading this post then please correct ldexp description in IL docs  because someone who wrote it clearly didn't know how it works.
double1 exp( double1 x ) { double1 cinv_log2((long double)1/(long double)std::log((long double)2)), cinv_k(1./32.); double1 s,r,m; m = round( cinv_log2*x ); r = cinv_k*fma(log((long double)2),m,x); // this part must be done with extended precision // taylor seriers expansion s = double1((long double)1/(long double)5040); s = mad(s,r,double1((long double)1/(long double)720)); s = mad(s,r,double1((long double)1/(long double)120)); s = mad(s,r,double1((long double)1/(long double)24)); s = mad(s,r,double1((long double)1/(long double)6)); s = mad(s,r,double1((long double)1/(long double)2)); s = mad(s,r,double1((long double)1/(long double)1)); s = s*r; // s = (1+s)^32 s = mad(s,s,s+s); // this mad can be removed for <2 ulp > cinv_k=1/16 s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = s + double1(1); return ldexp(s,convert_int1(m)); }

Double Precision Exp Function HD5870
hazeman Jan 29, 2010 6:54 AM (in response to Pheelios)Now the worse part ( and longer ). Second version of exp when there is no true fused mad. This one works also on 4xxx family. The error is ~1ulp ( less <1.5 ulp in my tests ).
For most sensitive part "x  log((long double)2)*m" doubledouble technique is used to extend precision. Also I would like to point out that this is based on code from qd library ( http://crd.lbl.gov/~dhbailey/mpdist/ ). The only algorithmic modification made by me is much faster ( and probably more "accurate" ) split function.
Again the code is written in C++ to IL converter/compiler, but it is very similiar to OpenCL ( ok templates aren't ).
PS. If the ldexp isn't available atm in OpenCL it can be raplaced by 5 operations ( &,<<,+,&, ).
double1 exp( double1 x ) { double1 cinv_log2((long double)1/(long double)std::log((long double)2)), cmln20(6.931471805599452862e01), cmln21(2.319046813846299558e17), cinv_k(1./32.); double1 s,r,m,p0,p1,r0,r1; m = round( cinv_log2*x ); //r = cinv_k*(x  log(2)*m); // cmln20,cmln21 < this is log(2) in doubledouble representation dd_prod_dbl( p0, p1, cmln20, cmln21, m ); // doubledouble * double dd_sum_dbl( r0, r1, p0, p1, x ); // doubledouble + double r = cinv_k*r0; // taylor seriers expansion s = double1((long double)1/(long double)5040); s = mad(s,r,double1((long double)1/(long double)720)); s = mad(s,r,double1((long double)1/(long double)120)); s = mad(s,r,double1((long double)1/(long double)24)); s = mad(s,r,double1((long double)1/(long double)6)); s = mad(s,r,double1((long double)1/(long double)2)); s = mad(s,r,double1((long double)1/(long double)1)); s = s*r; // s = (1+s)^32 s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = mad(s,s,s+s); s = s + double1(1); return ldexp(s,convert_int1(m)); } // // doubledouble with double // operations // namespace dd { void split( const double1& a, double1& hi, double1& lo ) { // in OpenCL it should be // hi = as_double(as_uint2(a) & uint2(0xFC000000,0xFFFFFFFF)); hi = a & uint2(0xFC000000,0xFFFFFFFF); lo = a  hi; } void split( const double2& a, double2& hi, double2& lo ) { hi = a & uint4(0xFC000000,0xFFFFFFFF,0xFC000000,0xFFFFFFFF); lo = a  hi; } template<typename D> variable<D> two_prod( const variable<D>& a, const variable<D>& b, variable<D>& err) { variable<D> a_hi, a_lo, b_hi, b_lo,p; p = a*b; split(a, a_hi, a_lo); split(b, b_hi, b_lo); //err = ((a_hi * b_hi  p) + a_hi * b_lo + a_lo * b_hi) + a_lo * b_lo; err = mad( a_lo,b_lo, mad(a_hi,b_hi,p) + mad(a_hi,b_lo,a_lo*b_hi) ); return p; } template<typename D> variable<D> two_sum( const variable<D>& a, const variable<D>& b, variable<D>& err) { variable<D> s,bb; s = a + b; bb = s  a; err = (a  (s  bb)) + (b  bb); return s; } template<typename D> variable<D> quick_two_sum( const variable<D>& a, const variable<D>& b, variable<D>& err) { variable<D> s; s = a + b; err = b  (s  a); return s; } } // dd template<typename D> void dd_prod_dbl( variable<D>& r0, variable<D>& r1, const variable<D>& a0, const variable<D>& a1, const variable<D>& b ) { variable<D> p1,p2; p1 = dd::two_prod( a0, b, p2 ); p2 = p2 + a1*b; p1 = dd::quick_two_sum( p1, p2, p2 ); r0 = p1; r1 = p2; } template<typename D> void dd_sum_dbl( variable<D>& r0, variable<D>& r1, const variable<D>& a0, const variable<D>& a1, const variable<D>& b ) { variable<D> s1,s2; s1 = dd::two_sum( a0, b, s2 ); s2 = s2 + a1; s1 = dd::quick_two_sum( s1, s2, s2 ); r0 = s1; r1 = s2; }

Double Precision Exp Function HD5870
MicahVillmow Jan 29, 2010 2:08 PM (in response to Pheelios)hazeman,
Can you point me to what you believe is incorrect? I'll poke at the correct person to get it fixed.
Double Precision Exp Function HD5870
hazeman Jan 29, 2010 4:48 PM (in response to MicahVillmow)Here is the ldexp description from IL docs.
> Puts an exp and a mantissa into a double.
This doesn't make sense ( it's probably sentence taken out of context from someone's explanation of ldexp algorithm )
> The computation is result = source1 * 2 source
should be dst.xy = (2^src1.x)*src0.xy
> Abs and negate modifiers can be used. Output.yx or output.wz are set to the doubeither input produces a NaN.
> Overflow produces inf. Underflow produces 0.
> Src0 is a double (1.0d < src0.xy < 1.0d).
src0 doesn't have to be in this range !
> Src1.x is an integer from 1024 to +1024.
> Since the second source src1 is an integer exponent, only the neg modifier is ...
I think this is also incorrect. ( if I remember correctly neg doesn't work on ints )
To tell you the truth after reading the text I really wasn't sure if this is the instruction I need. Name was correct but description wasn't. I had to make some tests to verify what I'm dealing with .
When we are on the subject of IL docs. Please force someone to read the whole text  it has so many spelling errors, missing letters, small logical errors.
