Stannum

Floating point rounding

2020-04-10 Permalink

What’s the most common and incorrect way to round a floating point number? It’s this:

float x = (int)(x + 0.5);

The cast to int truncates the fractional part. For negative numbers this yields completely bogus results. For example, −7.1 is offset to −6.6, which is in turn truncated to −6.

Using floor

A simple fix for the above problem is to use the floor function instead, that would consistently round towards negative infinity:

float x = floorf(x + 0.5); // -7.1 is rounded to -7.0

Unfortunately this method is still wrong. In fact there are two classes of numbers where it fails:

Fixing the floor

One might be tempted to throw a bunch of conditions to handle the above edge-cases. Surprisingly, however, they can all be handled simultaneously by replacing 0.5 in the above calculation with its predecessor:[1]

float x = floorf(x + 0.49999997f); // for 32-bit floats
double x = floor(x + 0.49999999999999989); // for 64-bit doubles

Note that this breaks ties towards +∞, rather than away from zero like the C99 functions below.

Using C99 functions

C99 already provides multiple rounding functions that can be summarized as follows:

Away from zeroRounding mode
Floating point resultroundnearbyint, rint
Checked integer resultlroundlrint

Legend:

Notice that rint is an oddball: it rounds based on the current rounding mode, returns a floating point result, but additionally can raise an inexact exception like the integer ones do. The table above suggests that nearbyint should have been named rint instead, in order to match the naming scheme.

Footnotes

  1. The floating point successor or predecessor can be calculated with nextafter.

Share on