
Full Precision on Transcendental Math
michael.chu May 5, 2008 7:16 PM (in response to bayoumi)Hi Amr,
Currently, the transcendental functions such as sin/cos are handled natively by the hardware. The Radeon 3870 and FireStream 9170 have native singleprecision transcendentals. What Brook+ is doing is casting doubles down to floats for the transcendentals and then casting them back to doubles. There may be some problem with that path and we are having the team take a look.
In a future release, we'll be introducing an emulation path for double precision transcendentals.
Michael. 
Full Precision on Transcendental Math
bayoumi May 5, 2008 8:57 PM (in response to bayoumi)Hi Michael,
I tried to use something like:
y = exp((float) (x)); where y<>, x<> are double out/in 1D streams, and y= (double)(exp(x)) or y = (double)(exp((float)x);
x was in the range of 1.0 to 1.0
none of thes worked. I tried sqrt(abs(x)), pow(x,a), log(abs(x)). I believe the rule applies to all.
I also noticed that there was no log10 function. There are some apps in EE which uses it
Thanks
Amr
Full Precision on Transcendental Math
michael.chu May 7, 2008 12:01 AM (in response to bayoumi)Hi Amr,
Can you try using a separate variable to convert from double to float, do the operation on a float to a float, and then use a separate variable to convert from a float back to a double? I'm just trying to isolate this and see if Brook+ is just misbehaving when doing the casting all at once.
i.e.
double a;
float a_float;
float b_float;
double b;
a_float = (float)a;
b_float = exp(a_float);
b = (double)b_float;
Thanks!
Michael.


Full Precision on Transcendental Math
bayoumi May 7, 2008 1:21 AM (in response to bayoumi)Dear Michael
I tried several tests:
1 The test you requested works fine with the exp() function using intermediate variables, and I assign the result of b after type casting to an "out double b_out<>" without problems.
2 To test type casting on streams, I used an input stream "float a_in<>" and I assign it use to type casting to double :
kernel void (float a_in<>, out double b_out<>){
b_out = (double)a_in;
}
This works fine as well
3 I use
kernel void (double a<>, out double b_out<>){
b_out = exp((float)a);
/* or b_out = (double)exp((float)a);*/
}
and this does NOT work.
It seems there is a problem with the math function type casting(probably function prototype or so), and NOT with the math function itself and NOT with type casting on streams 
Full Precision on Transcendental Math
bayoumi May 7, 2008 1:49 AM (in response to bayoumi)To test type casting from double to float, I also tried :
kernel void (double a<>, out double b_out<>){
b_out = (double)((float)a);
}
and this works fine as well
Amr 
Full Precision on Transcendental Math
bayoumi May 7, 2008 4:38 AM (in response to bayoumi)Hi Michael,
I was able to more tracking for the problem. It is the type casting from the exp() fun output, not input.
If you use intermediate "dummy" float streams for the output, then you do a separate type casting, everything works.
kernel void (double a_double<>, float tmp_float<>, float out double b_double_out<>){
tmp_float = exp((float)a_double);
tmp_float = exp((float)a_double);
}
everything work OK.
If you try to use:
tmp_float = exp(a_double);
or
b_double_out = exp((float)a_double);
or
b_double_out = (double) exp((float)a_double);
Then NOTHING works.
So for now:
the use of a dummy float stream + type casting the input stream at the function input from double to float
is the only turnaround I was able to find
Thanks
Amr
Full Precision on Transcendental Math
lpw May 14, 2008 5:01 PM (in response to bayoumi)Another alternative is to brew your own double precision transcendentals. If you really need the precision, then this seems like the only option at the moment. I was able to port the log2(double) function from the Cephes library to IL with good results. Of course, integrating this with Brook+ is another matter altogether. That's one of the reasons I moved to CAL.
Lukasz.

Full Precision on Transcendental Math
michael.chu May 21, 2008 6:35 AM (in response to lpw)Hi Lukasz,
I have let the engineering team know that we need double precision transcendental functions for Brook+. I'm trying to get that project into the schedule.
Michael.
Full Precision on Transcendental Math
NurEinMensch Aug 25, 2009 6:14 PM (in response to michael.chu)Hello Michael,
how far is the work on the double precision transcendentals ?
They are really important for scientific problems.
Having only mult for double is helpful but not the aim of stream.
Scientific problems need double > hardware can ..
But now software can not use exp(), log(), sin(), cos() etc ..!
When we can ? :)
ps: hoply with sdk2..
Best wishes
Marek

Full Precision on Transcendental Math
riza.guntur Aug 26, 2009 1:12 AM (in response to NurEinMensch)Yes I'm waiting for that too.
And if possible more feature for brook+ (although that won't be included in sdk2, sigh)

Full Precision on Transcendental Math
NurEinMensch Sep 23, 2009 6:20 AM (in response to riza.guntur)Any news from that side ??

Full Precision on Transcendental Math
NurEinMensch Sep 25, 2009 4:18 PM (in response to NurEinMensch)Hi,
after i read the release notes to SDK 2.0 beta3 i don't expect any
double exp(),sin() functionality for GPU in 2009...
In the FAQ is the information that double precision is optional in OpenCL (i think AMD/ATI forced it, as OpenCL partner because it seems that there is a bigger problem to realize this option ...).
So my hope died that this will be possible in the near future. Maybe there have to be a hardware fix (i guess which even is not done in RV8xx)..
Thats really sad, GPU power have such potential for scientific tasks, but we can not optimize our programs because there is no support till now ...
Best wishes
Marek

Full Precision on Transcendental Math
NurEinMensch Sep 27, 2009 10:24 AM (in response to NurEinMensch)Hello,
because there are no comments about future support of i.e. DP exp() i googled a little bit to find hints where the problem could be.
I found something about HPC 64bit exponential function implementation in FPGA (especially for scientific demands):
http://www.springerlink.com/content/n553027524j05066/
and other articles about lookup tables.
Now i guess that no gpu (up to rv8xx) contain a DP lookup table for exp() sin() etc ...
So only expensive softwareside solution can be used on current gpus with probable worse performance ..
I would be very happy about any reply of an ATI Developer to that topic ..
Best wishes
Marek

Full Precision on Transcendental Math
Gipsel Sep 27, 2009 11:10 PM (in response to NurEinMensch)Originally posted by: NurEinMensch Hello,
because there are no comments about future support of i.e. DP exp() i googled a little bit to find hints where the problem could be.
I found something about HPC 64bit exponential function implementation in FPGA (especially for scientific demands):
http://www.springerlink.com/content/n553027524j05066/
and other articles about lookup tables.
Now i guess that no gpu (up to rv8xx) contain a DP lookup table for exp() sin() etc ...
So only expensive softwareside solution can be used on current gpus with probable worse performance ..
I would be very happy about any reply of an ATI Developer to that topic ..
Best wishes
Marek
I'm not an ATI developer, but at least I've implemented my own exp() in IL for my code. It works more or less the same as it is done on a CPU. Those lookup tables are also not stored in an onchip ROM or so, they are simply provided by the software. exp() is quite expensive on CPUs, too.
Furthermore, one doesn't need the lookup tables at all (but one could use the constant buffer for it, if one wants). There are other implementations using the quotient of two power series of quite low order (3, if I remember it right). So one just needs an argument reduction, a few constants for the power series, the division (which is also done in software btw.) and the ldexp instruction (which the GPU hardware is capable of). There are several different implementations out there using this scheme and it works also on GPUs. Maybe it is not the fastest possible algorithm, but it isn't that slow either if you compare it with the CPU.







