Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- AMD Community
- Communities
- Developers
- Newcomers Start Here
- Precise decimal calculations in float

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content

04-15-2022
06:04 PM

Precise decimal calculations in float

**Float 0.1 + float 0.2 == float 0.3 is true!**

Don't believe? However, it is true. Only you need to compare correctly rounded Float 0.1+ float 0.2 with correctly rounded float 0.3.

At the link https://www.onlinegdb.com/zrrXaiyOo you can see the C++ code, where the correctly rounded sum of float 0.111111E-21+float 0.222222E-21 is compared bit by bit with the correctly rounded float 0.333333E-21. If desired, decimal variables at the input of the program can be replaced by any others with mantissa <=9999999 and exponent |E|<=48.

Correct rounding, in our case, is when for all binary y(float)=x(decimal) there is such a y'(float)=x'(decimal) that x'≈ Z. Where Z is a decimal floating point number with N significant digits of the mantissa, and N is the number of significant digits up to which all calculations are performed. In other words, after each operation, the result is rounded up to N significant digits of the mantissa. N is specified by the user and is a global constant in the program. The parentheses in y'(float) and x'(decimal) indicate that y' is a float and x' is a decimal.

The number y' is called the binary equivalent of the decimal number Z. And arithmetic in which the binary equivalents of decimal numbers are the arguments is called arithmetic of binary equivalents.

Why is it needed.

- Correct rounding allows you to get a strict zero value when subtracting.
- Comparison of binary equivalents of decimal numbers is performed in a trivial way. For any value of the binary exponent, the comparison for binary mantissas is performed bit by bit.
- For the chosen value N of the decimal precision of calculations, a constant optimal number of binary digits of the operational registers is required. For example, for N=3, only 11 bits are required for the mantissa of a binary floating point number.
- If some binary number with p significant digits is the binary equivalent of Z decimal with N significant digits, then another number with p'>p binary significant digits can also be correctly rounded to a Z with N decimal digits, provided that in these numbers the first p significant digits are the same. From here, the problem of converting a number from one binary format to another is simply solved.

And that's not all. The arithmetic of binary equivalents is fully consistent with the usual arithmetic for decimal numbers of limited capacity. Therefore, it is free from all kinds of artificial structures and the question of portability does not arise in it.

The author needs interested partners for a deeper development of the proposed theory, as well as for the development of various applications for computational problems. Criticism and questions are welcome.

You can get acquainted with the basic ideas of binary equivalent arithmetic at

https://doi.org/10.36227/techrxiv.19294511.v2

0 Replies