# Floating point rounding

2020-04-10 Permalink

What’s the most common and incorrect way to round a floating point number? It’s this:

float x = (int)(x + 0.5);

The cast to `int`

truncates the fractional part. For negative numbers this yields completely bogus results. For example, −7.1 is offset to −6.6, which is in turn truncated to −6.

## Using `floor`

A simple fix for the above problem is to use the `floor`

function instead, that would consistently round towards negative infinity:

float x = floorf(x + 0.5); // -7.1 is rounded to -7.0

Unfortunately this method is *still wrong*. In fact there are two classes of numbers where it fails:

When

`x`

is an odd integer in the [8388608, 16777216) interval. For example:1.0000000 00000000 00000001 × 2^23 = 8388609

Adding 0.5 gives:

1.0000000 00000000 00000001 1 × 2^23

The mantissa is then rounded to fit in the 23 bits. The default IEEE round-to-even mode yields:

1.0000000 00000000 00000010 × 2^23 = 8388610

Flooring that returns 8388610 unchanged. The expected result is 8388609, however.

When

`x`

is the predecessor of 0.5, i.e. the biggest representable number smaller than 0.5:1.1111111 11111111 11111111 × 2^-2 = 0.49999997

Adding 0.5 gives:

1.1111111 11111111 11111111 1 × 2^-1

This is rounded to fit in 23 bits:

1.0000000 00000000 00000000 × 2^0 = 1.0

Flooring, again, returns 1 whereas the expected result is 0.

## Fixing the `floor`

One might be tempted to throw a bunch of conditions to handle the above edge-cases. Surprisingly, however, they can all be handled simultaneously by replacing 0.5 in the above calculation with its predecessor:[1]

float x = floorf(x + 0.49999997f); // for 32-bit floats double x = floor(x + 0.49999999999999989); // for 64-bit doubles

Note that this breaks ties towards +∞, rather than away from zero like the C99 functions below.

## Using C99 functions

C99 already provides multiple rounding functions that can be summarized as follows:

Away from zero | Rounding mode | |
---|---|---|

Floating point result | `round` | `nearbyint` , `rint` |

Checked integer result | `lround` | `lrint` |

Legend:

**Away from zero**—always round to nearest, and break ties away from zero.**Rounding mode**—use the rounding mode as set with`fesetround()`

, which, by default, is round to nearest integer, breaking ties to even.**Floating point result**—don’t raise exceptions, return floating point.**Checked integer result**—return an integer, and can be configured to raise an exception if the result is not representable in the integer type (`FE_INVALID`

), or if the input number is not a round number (`FE_INEXACT`

).

Notice that `rint`

is an oddball: it rounds based on the current rounding mode, returns a floating point result, but additionally can raise an inexact exception like the integer ones do. The table above suggests that `nearbyint`

should have been named `rint`

instead, in order to match the naming scheme.

## Footnotes

The floating point successor or predecessor can be calculated with

`nextafter`

.