Floating point rounding
2020-04-10 Permalink
What’s the most common and incorrect way to round a floating point number? It’s this:
float x = (int)(x + 0.5);
The cast to int
truncates the fractional part. For negative numbers this yields completely bogus results. For example, −7.1 is offset to −6.6, which is in turn truncated to −6.
Using floor
A simple fix for the above problem is to use the floor
function instead, that would consistently round towards negative infinity:
float x = floorf(x + 0.5); // -7.1 is rounded to -7.0
Unfortunately this method is still wrong. In fact there are two classes of numbers where it fails:
When
x
is an odd integer in the [8388608, 16777216) interval. For example:1.0000000 00000000 00000001 × 2^23 = 8388609
Adding 0.5 gives:
1.0000000 00000000 00000001 1 × 2^23
The mantissa is then rounded to fit in the 23 bits. The default IEEE round-to-even mode yields:
1.0000000 00000000 00000010 × 2^23 = 8388610
Flooring that returns 8388610 unchanged. The expected result is 8388609, however.
When
x
is the predecessor of 0.5, i.e. the biggest representable number smaller than 0.5:1.1111111 11111111 11111111 × 2^-2 = 0.49999997
Adding 0.5 gives:
1.1111111 11111111 11111111 1 × 2^-1
This is rounded to fit in 23 bits:
1.0000000 00000000 00000000 × 2^0 = 1.0
Flooring, again, returns 1 whereas the expected result is 0.
Fixing floor
One might be tempted to throw a bunch of conditions to handle the above edge-cases. Surprisingly, however, they can all be handled simultaneously by replacing 0.5 in the above calculation with its predecessor:[1]
float x = floorf(x + 0.49999997f); // for 32-bit floats double x = floor(x + 0.49999999999999989); // for 64-bit doubles
Note that this breaks ties towards +∞, rather than away from zero like the C99 functions below.
Using C99 functions
C99 already provides multiple rounding functions that can be summarized as follows:
Away from zero | Rounding mode | |
---|---|---|
Floating point result | round | nearbyint , rint |
Checked integer result | lround | lrint |
Legend:
- Away from zero—always round to nearest, and break ties away from zero.
- Rounding mode—use the rounding mode as set with
fesetround()
, which, by default, is round to nearest integer, breaking ties to even. - Floating point result—don’t raise exceptions, return floating point.
- Checked integer result—return an integer, and can be configured to raise an exception if the result is not representable in the integer type (
FE_INVALID
), or if the input number is not a round number (FE_INEXACT
).
Notice that rint
is an oddball: it rounds based on the current rounding mode, returns a floating point result, but additionally can raise an inexact exception like the integer ones do. The table above suggests that nearbyint
should have been named rint
instead, in order to match the naming scheme.
Footnotes
The floating point successor or predecessor can be calculated with
nextafter
.