Google SAS Search

Add to Google

Wednesday, November 02, 2005

What Every Computer Scientist Should Know About Floating-Point Arithmetic

So I've been thinking about floating point numbers recently. Mostly I've been thinking about comparing floating numbers for _relative_ equality. I know this certainly isn't a new issue for most, especially if you have worked with a "lower" level language like C/C++, but for the average SAS programmer it may come as a surprise that 7.4 may not = 7.4! In fact the rules of real numbers dictate that 7.4 can never = 7.4 since they are both approximations ( or shorthand ) for an infinetly precise number with decimal places stretching from here to Mars and back again ad infinitum.

In the general world we don't really care that much about floating point inequality because our precision, or more specifically lack-of-precision, makes it a moot point.

But in the world of computers and real numbers precision is always an issue. As anyone who has been unfortunate enough to write code such as this has (painfully) learned:


// add .10 cents rebate to the customers account till they have reached
// the rebate maximum
// called by perVisit() function

const float REBATE_MAXIMUM = 2.5; // $2.50 rebate max

void addRebate( Customer &c)
{
if ( c.accumulatedRebate == REBATE_MAXIMUM ) return;
else
{
// add the ten cents to their account and update their accumulated rebate
// so they do not go over
c.account += .1;
c.accumulatedRebate += .1;
}
}

Now who's going to explain to the CEO why all the 3rd quarter revenue got eliminated in massive rebates? GULP.

Hopefully you recognize the error in the above code? Since the values being compared are floats they are not really 2.50 but really something closer to 2.50000000007 or 2.5000000000001 or well _anything_ once you get past the signifigant digits of 2.50.

But the above code is C++ and as a SAS programmer you don't have to worry about those kinds of hairy details right? Try this code from Data Savant Consulting(which has a nice page discussing this very issue):


data _null_;
x = 7.3;
x = x+ 0.1;
y = 7.4;
if x = y then put "Duh! of course they are equal.";
else put "Doooh! " x " and " y " are not equal!!";
run;

Then go read this!
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Or if you don't have the time to grind through that, just remember comparing floats for equality is not usually a good idea.

2 comments:

  1. "just remember comparing floats for equality is not usually a good idea"
    Given that SAS only stores quantities as floats (unless you're using macro values) then we're really in a bind, aren't we?
    I'm surprised all my DO loops stop at the right time. Well, almost all my DO loops, anyway.
    In my experience I haven't been tripped up by comparisons of the type described here as much as by formatted values that don't line up without specifying the right FUZZ value. Also, my understanding is that the real problems arise when the errors caused by imprecise floating point representation are exacerbated by interative computations. I always suspect this as a possible culprit when encountering an arcane set of PROC options cause problems for infrequently used statistics in the user notes. I'd be concerned about homebaked stats in IML or the datastep because SAS might not "know" enough about what you're doing to save you from yourself, whereas I suspect the fixes for these procedural black holes might involve coding around imprecision.
    Finally, deriving floating point results from restricted integer options might be likely to start off such processes on the wrong foot, as in comparing processes where one group has lots of 1/3s and another has lots of 2/3s and 0/3s, where rounding errors might cause the latter to appear larger. Especially if compounded by inappropriate use of the Round function.

    ReplyDelete
  2. Indeed, running this code will cause a subtle problem:
    data _null_;
    x = 7.3;
    do until( x >= 7.4 );
    x = x+ 0.1;
    put x= hex16.;
    end;
    * shouldnt x = 7.4 here?;
    run;

    Using (x = 7.4) causes an infinite loop. Whereas, (x >= 7.4) is misleading at best.

    I think your point about floats being used in some of the statistical procedures causing subtle problems is a good one. I don't have a statistical background, but I know some statisticians will shy away from SAS and use STATA instead. In STATA you have more control over how your numbers are stored as this link describes.

    ReplyDelete