My software is able to run on both CPU and GPU, and on the GPU I got some stranges results that seems related to floating point accuracy.
Notice that I have no problem on the CPU, because computations values stay on the FPU as 64 bits registers, and rounding is done with more precision, it is automatic on the CPU.
My code seems correct and I can fix some problems by using 'double' instead, sometimes... but it remains some problems I can't fix on the GPU
Does someone has ever got some problems of this kind ? and have you find some solutions ?
if your code and algorithm are correct, it should not be that way. Have you considered things like truncation error?
Floating point is riddled with these problems. Anyone who deals with them will hit them on any platform.
And yes, opencl's precision on a gpu is not identical to the precision on a cpu.
opencl defines the precision required of implementations in the specification, it is up to you to verify your maths is stable at that precision, or find ways to account for it. Such as using different maths, replace some of the functions with your own more accurate ones, etc.
First of all if we talk about basic operations ( +, -, /, * ) AMD GPUs give exactly the same results as CPU ( with exception of native double div ). For fused mad the accuracy is even higher than what CPUs can do.
Most people just simply forget that CPU/FPU uses 80 bit precision for internal registers and all operations. So only when you store float/double values in memory they are truncated to proper size/representation.
The difference is not because of some magical GPU's inaccuracies but because you compare results from 80 bit math with results from 32 or 64 bit math.
There are 2 options to get the same results on CPU. You can make basic operations that store results in memory before they are reused ( overload operators in C++ ). Or you can switch to SSE because it doesn't use this archaic FPU 80 bit mode ( you can force gcc to use sse instead of fpu ).
What type of computations are you performing?
If you are performing linear algebra computations like LU Decomposition, the condition-number of the matrix being very high (ill-conditioned) will cause the rounding errors of the GPU to be magnified many times and the end results to be significantly higher. I faced this problem and it got resolved when I used doubles. Please see my most recent post.