Hi!
I have a kernel like this:
__kernel void vzPowx_dc(__global const double *aSrc, const int aSrcIdx,
const double bRe, const double bIm,
__global double *Dst, const int DstIdx,
const int Length)
{
dobule Re, Im, XRe, XIm, ExpX;
size_t i = get_global_id(0);
XRe = aSrc[aSrcIdx + i];
Re = log(fabs(XRe));
Im = atan2(0, XRe);
XRe = Re * bRe - Im * bIm;
XIm = Im * bRe + Re * bIm;
ExpX = exp(XRe);
Im = sincos(XIm, &Re);
Re *= ExpX;
Im *= ExpX;
Dst[2*(DstIdx + i)] = Re;
Dst[2*(DstIdx + i)+1] = Im;
}
The code running via AMD Open CL using CPU as Target runs with only 4 numerals of precision (single precision typical) instead of 8.
Thanks!
Atmapuri
Have you enables khr_fp64 or amd_fp64 extensions?
Try with amd_fp64 on the CPU
#pragma OPENCL EXTENSION cl_amd_fp64: enable
Hi!
Yes, I have this enabled, otherwise the kernel wont even compile. I tried both with Intel and AMD driver and the precision remains equal (very low). I also verified that the results are correct (with the precision mentioned) and that this is in fact what I am computing. When running C++ on CPU I get 7 valid numerals.
Thanks!
Atmapuri
How do you evaluate your result?
Against the output of a C++ function?
Try might help identify where the problem is exactly:
add cl_amd_printf: enable
printf all the intermediate results of a couple of threads.
printf the equivelant positions in the loop in the reference program