Floating point rounding

2020-04-10 Permalink

What’s the most common and incorrect way to round a floating point number? It’s this:

float x = (int)(x + 0.5);

The cast to int truncates the fractional part. For negative numbers this yields completely bogus results. For example, −7.1 is offset to −6.6, which is in turn truncated to −6.

Using `floor`

A simple fix for the above problem is to use the floor function instead, that would consistently round towards negative infinity:

float x = floorf(x + 0.5); // -7.1 is rounded to -7.0

Unfortunately this method is still wrong. In fact there are two classes of numbers where it fails:

When x is an odd integer in the [8388608, 16777216) interval. For example:
```
1.0000000 00000000 00000001 × 2^23 = 8388609
```
Adding 0.5 gives:
```
1.0000000 00000000 00000001 1 × 2^23
```
The mantissa is then rounded to fit in the 23 bits. The default IEEE round-to-even mode yields:
```
1.0000000 00000000 00000010 × 2^23 = 8388610
```
Flooring that returns 8388610 unchanged. The expected result is 8388609, however.
When x is the predecessor of 0.5, i.e. the biggest representable number smaller than 0.5:
```
1.1111111 11111111 11111111 × 2^-2 = 0.49999997
```
Adding 0.5 gives:
```
1.1111111 11111111 11111111 1 × 2^-1
```
This is rounded to fit in 23 bits:
```
1.0000000 00000000 00000000 × 2^0 = 1.0
```
Flooring, again, returns 1 whereas the expected result is 0.

Fixing `floor`

One might be tempted to throw a bunch of conditions to handle the above edge-cases. Surprisingly, however, they can all be handled simultaneously by replacing 0.5 in the above calculation with its predecessor:[1]

float x = floorf(x + 0.49999997f); // for 32-bit floats
double x = floor(x + 0.49999999999999989); // for 64-bit doubles

Note that this breaks ties towards +∞, rather than away from zero like the C99 functions below.

Using C99 functions

C99 already provides multiple rounding functions that can be summarized as follows:

	Away from zero	Rounding mode
Floating point result	`round`	`nearbyint`, `rint`
Checked integer result	`lround`	`lrint`

Legend:

Away from zero—always round to nearest, and break ties away from zero.
Rounding mode—use the rounding mode as set with fesetround(), which, by default, is round to nearest integer, breaking ties to even.
Floating point result—don’t raise exceptions, return floating point.
Checked integer result—return an integer, and can be configured to raise an exception if the result is not representable in the integer type (FE_INVALID), or if the input number is not a round number (FE_INEXACT).

Notice that rint is an oddball: it rounds based on the current rounding mode, returns a floating point result, but additionally can raise an inexact exception like the integer ones do. The table above suggests that nearbyint should have been named rint instead, in order to match the naming scheme.

Footnotes

The floating point successor or predecessor can be calculated with nextafter.

Floating point rounding

Using floor

Fixing floor

Using C99 functions

Footnotes

Using `floor`

Fixing `floor`