Have you enables khr_fp64 or amd_fp64 extensions?
Try with amd_fp64 on the CPU
#pragma OPENCL EXTENSION cl_amd_fp64: enable
Yes, I have this enabled, otherwise the kernel wont even compile. I tried both with Intel and AMD driver and the precision remains equal (very low). I also verified that the results are correct (with the precision mentioned) and that this is in fact what I am computing. When running C++ on CPU I get 7 valid numerals.
How do you evaluate your result?
Against the output of a C++ function?
Try might help identify where the problem is exactly:
add cl_amd_printf: enable
printf all the intermediate results of a couple of threads.
printf the equivelant positions in the loop in the reference program