Floating Point Errors

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Floating Point Errors by pc88mxer (Vicar) on May 27, 2008 at 20:49 UTC
Floating point numbers are only approximate. To check for a floating point value being zero you have to check for its absolute value being smaller than some small tolerance, i.e.: `sub is_zero { abs($_[0]) < 1e-10 }` [download] Likewise, two floating point numbers should be considered equal if their difference is zero (as defined by your application). This is something that you always have to do when you use floating point numbers. For more on the subject, see the node Why is Zero not 0? If you need exact arithmetic, perl has support for arbitrary precision numbers in the form of big integers and big rationals. See `perldoc bigint` and `perldoc bigrat` respectively.	[reply] [d/l] [select]
Re: Floating Point Errors by kyle (Abbot) on May 27, 2008 at 20:56 UTC
Don't use floating point math. Start with integers. Never overflow, never divide, and everything will be fine. You might be interested in What Every Computer Scientist Should Know About Floating-Point Arithmetic.	[reply]
Re^2: Floating Point Errors by Anonymous Monk on May 27, 2008 at 21:05 UTC
well I need to use floating integers because my precision should be at about 0.01. The project is about molecular modelling so I am using perl:pdl and there are many times I need to take inverses of operators. It is highly unlikely that I can avoid floating integers :/ I was just right now reading the article you gave though, thanks.	[reply]
Re^3: Floating Point Errors by starbolin (Hermit) on May 28, 2008 at 00:11 UTC
I think what kyle was trying to say is that you will always have errors with floating-point. The only way to avoid them is to rescale all your operations to used fixed- point math. It can be done. Before system designers had math co-processor chips and Pentium uber-chips many analog systems were designed using fixed point math. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply]
Re^3: Floating Point Errors by kyle (Abbot) on May 28, 2008 at 15:45 UTC
starbolin is correct. I struggled through a whole semester about nothing but floating point problems, and almost the only thing I took away from it was, "don't use floating point numbers." There are an infinite number of real numbers between 0 and 1, but only a finite number of them can be represented the way floats are normally stored. Out between 100 and 101, there is the same infinity of real numbers, but even fewer of them can be represented. Looking at the number line through the filter of a float is a sea of emptiness and disappointment, I tell you!	[reply]
Re: Floating Point Errors by ikegami (Patriarch) on May 27, 2008 at 21:48 UTC
Avoid adding and subtracting numbers that vary greatly in magnitude. For example, `sub bad_average { my $acc = 0; $acc += $_ for @_; return $acc / @_; } sub better_average { my $acc = 0; $acc += $_/@_ for @_; return $acc; }` [download] When you do comparisons, use a tolerance. For example, `bad: if ($x == $y) abs tol: if (abs($x - $y) < $abs_tol) rel tol: if (abs($x - $y) < abs($x) * $rel_tol)` [download] Avoid accumulation of error. For example, `bad: for (my $x=0; $x<100; $x+=0.1) { ... } good: for (0..99) { my $x = $_/10; ... }` [download] Deal with the integers when possible instead of floats. For example, work with cents instead of dollars.	[reply] [d/l] [select]
Re^2: Floating Point Errors: How and Why to Avoid Adding and Subtracting Varying Numbers by Narveson (Chaplain) on May 27, 2008 at 22:39 UTC
Please supply a rationale for `better_average`. Update:I thought the code below would speak for itself, but here is my own rationale for posting it, and for questioning the `better_average` routine submitted above. The only difference between `bad_average` and `better_average` is that a division by a constant is moved inside the innermost loop. If this were handed to any of us as a homework problem in refactoring, would we not pull the division back out again? But numerical methods are for many of us an intimidating twilight zone in which paradoxical results hold true. Let us require at least one example in support of every numerical claim. I claim that `better_average` is worse than `bad_average`, and here follows my supporting example. `use strict; use warnings; sub bad_average { my $acc = 0; $acc += $_ for @_; return $acc / @_; } sub better_average { my $acc = 0; $acc += $_/@_ for @_; return $acc; } my @inputs = (1) x 65537; # 2**16 + 1 my $true_average = 1; for my $adj (qw[bad better]) { my $func = "${adj}_average"; no strict 'refs'; my $average = &$func( @inputs ); my $diff = $average - $true_average; print "Error using $func is ", $diff? "a nonzero number near $diff" : 0, "\n" }` [download] Your roundoff error may vary, but on my system I get `Error using bad_average is 0 Error using better_average is a nonzero number near 3.33066907387547e- +015` [download]	[reply] [d/l] [select]
Re^3: Floating Point Errors: How and Why to Avoid Adding and Subtracting Varying Numbers by kyle (Abbot) on May 28, 2008 at 18:54 UTC
That's a fairly contrived example. The `better_average` loses here because of compounded errors in float division. I tried using random numbers instead, and I found them more or less equivalent. Read more... (1513 Bytes) (As an aside, it occurs to me that I could have written "`( $cmp > 0 ) ? 'bad' : ( $cmp < 0 ) ? 'better' : 'tie';`" as "`('tie', 'bad', 'better')[$cmp]`" if `$cmp` is 0, 1, or -1. That seems pretty obscure, though.) I've fooled with the various knobs a bit, and what seems to be true is that "bad" and "better" win about evenly. I thought maybe one would be better for large numbers than the other, but I haven't found that either. I should note that this is my first use of Math::BigFloat, so it's entirely possible my entire test is bogus. Still, I think this is a better test of each algorithm than pumping them full of ones.	[reply] [d/l] [select]
Re^4: Floating Point Errors: How and Why to Avoid Adding and Subtracting Varying Numbers (contriving) by tye (Sage) on May 29, 2008 at 01:53 UTC
Re^5: Floating Point Errors: How and Why to Avoid Adding and Subtracting Varying Numbers (contriving) by kyle (Abbot) on May 29, 2008 at 03:10 UTC
Some notes below your chosen depth have not been shown here
Re^3: Floating Point Errors: How and Why to Avoid Adding and Subtracting Varying Numbers by ikegami (Patriarch) on May 31, 2008 at 05:20 UTC
Sorry for the delay in responding. Like I said in my message to you, I wanted some time to write a proper reply. It's been a while since I've dealt with floats. Before I address your question, I want to address your snippet. I claim that `better_average` is worse than `bad_average`, and here follows my supporting example. You haven't shown that at all. Considering the difference in error of the two functions is 0.000000000000003 of the input, that makes them equally good or equally bad for that data set. Even if it did show `bad_average` to be better, your data sample is so horrible that no conclusion can be drawn from the comparison. In a discussion about minimizing the errors of dealing with floating point numbers, you used integers. Finally, you also haven't shown `better_average` is bad. Quite the opposite, an error of 0.000000000000003 of the input is rather good. If this were handed to any of us as a homework problem in refactoring, would we not pull the division back out again? When it comes to numerical methods, the order in which operations are done can have an impact on the results. For example, a(b+c) is equal to ab+ac in the mathematical world, but it's not necessarily the case in the practical world. You'd be wrong to refactor such code without considering the real-world implications. Please supply a rationale for better_average.* Because moving the division changes the magnitude of all the numbers being added by the same amount, `better_average` is no better than `bad_average`. It was a bad example. A better average might sort the numbers so that the small ones are processed first. (Beware of the signs...) There's been some speculation as to what I was thinking. What follows explains it. I said "`better_average` is no better than `bad_average`", but that would be more precise if you added "for floating point numbers". For fixed point numbers, `bad_average` would fail miserably. I've been working with fixed point numbers because the target machine had a very limited amount of memory. Using fixed point saved memory since the exponent wasn't stored along with the number. Adding lots of numbers together would result in an overflow when dealing with fixed point numbers. For example, consider a register that can hold 5 decimal digits. (Roughly 15 bits, but easier to visualize.) `40123 (/1000) + 50456 (/1000) + 20789 (/1000) ------------- overflow` [download] You could convert them to a different scaling first, but that would result in a loss of precision. `40123 (/1000) => 04012 (/100) 50456 (/1000) => + 05046 (/100) 20789 (/1000) => + 02079 (/100) ------------ 11137 (/100) (111.370 instead of 111.368)` [download] And if you were adding 1000 numbers, `40123 (/1000) => 00040 (/1) 50456 (/1000) => + 00050 (/1) 20789 (/1000) => + 00021 (/1) ... ... ---------- 00111 (/1) (111.000 instead of 111.368)` [download] By dividing first, you avoid the need to scale the numbers and therefore avoid the loss of precision. So to answer your question, I mistakenly mixed fixed point and floating point techniques.	[reply] [d/l] [select]
Re: Floating Point Errors by starbolin (Hermit) on May 27, 2008 at 21:04 UTC
That's a tall order. There exists many error sources when performing floating point numerical calculations. There have been whole books written on the subject. I would start by reading the references in the Wikipedia article The most important thing is to use the right algorithm. Know the error bounds of your algorithm and know how to do a sensitivity analysis. If your code is generating infinitesimals, then you need to change your approach or use guard digits. Without guard digits any small difference calculations are more likely to be erroneous than to be correct. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply]
Re^2: Floating Point Errors by Anonymous Monk on May 28, 2008 at 13:56 UTC
would adding like 00000001 to the end of my number (like: 32.41200000001) help?	[reply]
Re^3: Floating Point Errors by starbolin (Hermit) on May 28, 2008 at 18:17 UTC
I can't say as I don't know what operations you intend to perform on you 'number'. In general, the practice of adding an epsilon to floats just moves the problem around and does nothing but, perhaps, increase roundoff errors. Representational errors in floats are not evenly distributed . Take for example the following code: `print "Division by N\n"; for my $n (1..100) { if ( $n/100 == 0.01*$n ) { print "t" } else { print "n" } } Division by N ttttttttttttttttttttttttttttttttttntttttntttttntttttttttntttttttttttnn +tttttttttttnnttttttttttnnttttt` [download] In an ideal world the code should always return true but for some values returns false due to an imprecise binary representation for $n. Now if you are only doing an operation once the need to check the results it is acceptable to do the equality like this: `if (( x - y ) > epsilon ) then true` [download] Also consider that 1e-8 is also not exactly representable in binary: `perl -e 'our $sum;for(1..1000000){$sum+=0.00000001};printf "%1.20f\n", + $sum' 0.00999999999994859168` [download] This leads to the rule: Make all constants integer multiples of 2^n. i.e. `perl -e 'printf "%0.20f\n", 2/0x1000000' [11:08am] 0.00000011920928955078` [download] This is especially important if using small divisors. These methods falls apart if you are doing any kind of loop that allows the errors to accumulate; such as, Newtons Method, averages, Runge-Kutta, least-squares and etc. In these methods the error can grow to unanticipated magnitudes such to overwhelm a small, fixed, epsilon. Special methods are then needed that anticipate and correct for roundoff. s//----->\t/;$~="JAPH";s//\r<$~~/;{s\|~$~-\|-~$~\|\|\|s \|-$~~\|$~~-\|\|\|s,<$~~,<~$~,,s,~$~>,$~~>,, $\|=1,select$,,$,,$,,1e-1;print;redo}	[reply] [d/l] [select]
Re: Floating Point Errors by swampyankee (Parson) on May 28, 2008 at 15:43 UTC
Read this and go to the NA-net home page and read all the archived back issues of the NA Digest. Numerical analysis is a large and complex field which is (imho) poorly covered in most modern CS courses (as if no interesting problems in computing involve numbers). Information about American English usage here and here. Floating point issues? Please read this before posting. — emc	[reply]