cancel
Showing results for
Did you mean:

# Archives Discussions

## Floating point accuracy on GPU

Hi,

My software is able to run on both CPU and GPU, and on the GPU I got some stranges results that seems related to floating point accuracy.

Notice that I have no problem on the CPU, because computations values stay on the FPU as 64 bits registers, and rounding is done with more precision, it is automatic on the CPU.

My code seems correct and I can fix some problems by using 'double' instead, sometimes... but it remains some problems I can't fix on the GPU

Does someone has ever got some problems of this kind ? and have you find some solutions ?

Thanks

Tags (4)
4 Replies
Challenger

## Re: Floating point accuracy on GPU

if your code and algorithm are correct, it should not be that way. Have you considered things like truncation error?

Challenger

## Re: Floating point accuracy on GPU

Floating point is riddled with these problems.  Anyone who deals with them will hit them on any platform.

And yes, opencl's precision on a gpu is not identical to the precision on a cpu.

opencl defines the precision required of implementations in the specification, it is up to you to verify your maths is stable at that precision, or find ways to account for it.  Such as using different maths, replace some of the functions with your own more accurate ones, etc.

## Re: Floating point accuracy on GPU

First of all if we talk about basic operations ( +, -, /, * ) AMD GPUs give exactly the same results as CPU ( with exception of native double div ).  For fused mad the accuracy is even higher than what CPUs can do.

Most people just simply forget that CPU/FPU uses 80 bit precision for internal registers and all operations. So only when you store float/double values in memory they are truncated to proper size/representation.

The difference is not because of some magical GPU's inaccuracies but because you compare results from 80 bit math with results from 32 or 64 bit math.

There are 2 options to get the same results on CPU. You can make basic operations that store results in memory before they are reused ( overload operators in C++ ). Or you can switch to SSE  because it doesn't use this archaic FPU 80 bit mode ( you can force gcc to use sse instead of fpu ).