jasonk has asked for the wisdom of the Perl Monks concerning the following question:

Background

I'm working with some data files that include huge quantities of floating point numbers (GIS shapefiles, containing latitude/longitude/altitude/measurement info). The majority of the time, the numbers I'm working with are small enough for perl to handle with no problem, but occasionally (especially with measurements) I get data that gets me into trouble unless I use Math::BigFloat.


Problem

Unfortunately, due to the sheer volume of numbers being manipulated, using BigFloat for all of them has too large a performance impact (turning the run time from around 6 hours to as much as 10 days), so I'd prefer not to use BigFloat when it isn't necessary. The bright part of this is that the file headers indicate the range of numbers included in each measure, so I don't have to check the whole file, I just have to look at the header and determine if any measure has a minimum/maximum value that indicates that I should use BigFloat for that value.


Question

The problem I'm running into is that I can't figure out how I can examine a number and determine if it's going to be big enough to require BigFloat. So can anybody suggest a method for determining "how big is big?"?

  • Comment on Determining when Math::BigFloat is necessary?

Replies are listed 'Best First'.
Re: Determining when Math::BigFloat is necessary?
by CountZero (Bishop) on Feb 03, 2003 at 20:19 UTC

    Am I correct in assuming that it is not a question of how "big" or "small" the values which you are working with are, but rather how many significant digits you need to work with?

    8.98846567431158e+307 and 4.94065645841247e-324 seem large / small enough for most purposes, but if you need the full 300+ significant digits of accuracy, then Math::BigFloat or Match::BigInt are your only solution.

    If about 14 significant digits are sufficient, then you do not seem to need these packages.

    The number 14 is not at all magical. The Camel-book says:
    To convert from number to string, it does the equivalent of an sprintf(3) with a format of "%.14g" on most machines. (Programming Perl, 3ed, p. 59)

    This of course begs the question: How many significant digits are really used internally? It would take wiser monks than me, to answer that.

    CountZero

    "If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

Re: Determining when Math::BigFloat is necessary?
by BrowserUk (Patriarch) on Feb 03, 2003 at 22:36 UTC

    \I'm not exactly sure how useful the code is, but the reference in the pod should be.

    #! perl -slw =pod Extract from [perlnumber] On typical hardware, floating point values can store numbers with up to 53 binary digits, and with binary exponents between -1024 and 1024. In decimal representation this is close to 16 decimal digits and decimal exponents in the range of -304..304. The upshot of all this is that Perl cannot store a number like 12345678901234567 as a floating point number on such architectures without loss of information. =cut use strict; use Math::BigFloat; my $num = '1234567890123456789'; my $bf = Math::BigFloat->new(); for my $digits (14.. 17) { my $strnum = substr( $num, 0, $digits); $bf = Math::BigFloat->new( $strnum ); my $double = 0+$strnum; print $bf, $bf - $double ? 'Use BigFloat': 'Use native'; } __END__ 12345678901234.Use native 123456789012345.Use native 1234567890123456.Use BigFloat 12345678901234567.Use BigFloat

    Examine what is said, not who speaks.

    The 7th Rule of perl club is -- pearl clubs are easily damaged. Use a diamond club instead.

Re: Determining when Math::BigFloat is necessary?
by tall_man (Parson) on Feb 03, 2003 at 19:38 UTC
    Sounds like you want to find the largest and smallest floats representable by regular perl. Here's an easy way:
    my $i = 1.0; my $lasti; for (;;) { $lasti = $i; $i *= 2; print $i,"\n"; last if ($i == $lasti); }
    The last number before "inf" is your largest representable float (8.98846567431158e+307 on my system).

    A similar trick with division by 2 will give you the smallest representable float (4.94065645841247e-324 on my system).

    Update: Basing the decision on the number of digits of precision in the range will not work in general. For example all the numbers could be from 0 to 1 but if you require 50 digits of precision you should use BigFloat.

Re: Determining when Math::BigFloat is necessary?
by pg (Canon) on Feb 04, 2003 at 02:33 UTC
    Actually you don't need to worry how big is big at all, just let Perl do everything for you.

    Here is a piece of code I come up with:
    use Data::Dumper; $str = $ARGV[0]; $a = $str + 0.0; if ($a =~ m/INF/) { use bignum; $a = $str + 0.0; no bignum; } print Dumper($a);
    Play with it like this:
    perl -w test.pl 9e5000, will give you a BigInt
    perl -w 12.3, will give you a normal guy
    
    With limited testing, I cannot promise that it will work for all the situations, so try it out.
Re: Determining when Math::BigFloat is necessary?
by John M. Dlugosz (Monsignor) on Feb 04, 2003 at 06:48 UTC
    That can't be answered with the information given. It's not the representation of your input numbers that matters, but the arithmetic that drops off significance that give incorrect results. For example, if you compute c*(a-b) and all three numbers are well within the legal range and precision of floating point, you can still get the wrong answer if a and b are nearly equal and c is of a magnitude that's not nearly the same as that difference.

    Perhaps you can do some testing and see which datasets give the same answer the fast way and the slow way, then see if you can derive some emperical ideas from that.

    The formal approach would be to compute tolerances along side the main calculation. This would cut the speed in half, but that's still nothing like the factor of 40 you have now. Then you can detect when the significance requires upgrading to bigfloat. In fact... you could write a lazybigfloat module that does this for you automatically. Then you can do some analysis and only employ it for expressions that might need it, like in the example above.

    —John

Re: Determining when Math::BigFloat is necessary?
by toma (Vicar) on Feb 04, 2003 at 04:39 UTC
    Another approach to this problem would be to avoid perl's limitation in conversion of the number between string and float.

    This C code:

    int main(void) { double d; d=1.0; while(d>0.0) { printf("%100.80e\n",d); d = d/2.0; } return 1; }
    produces the exact same result as this perl code:
    my $d; $d=1.0; while($d>0.0) { printf("%100.80e\n",$d); $d = $d/2.0; }
    But this is completely different from this perl code, which includes a conversion to string:
    my $d; $d=1.0; while($d>0.0) { $d= "$d"; printf("%100.80e\n",$d); $d = $d/2.0; }
    The output from the perl program with the string conversion is clearly messed up; it prints these two successive values:
    2.38418579101562500000000000000000000000000000000000000000000000000000 +000000000000e-07 1.19209289550780998537093783879586839091757610731292515993118286132812 +500000000000e-07
    Where the C code and perl code without the string conversion has the correct result:
    2.38418579101562500000000000000000000000000000000000000000000000000000 +000000000000e-07 1.19209289550781250000000000000000000000000000000000000000000000000000 +000000000000e-07
    It would be nice if there were a way to tell perl "don't ever convert this number to a string unless I say it's okay." Short of that, you have to either be careful to avoid string conversion, write your numerical code in XS, use BigFloat a lot, or use straight C or FORTRAN. The Inline module also makes calling C easy. C is almost as easy as perl for numerical programming anyway. The PDL module may also do a better job for what you need, especially if you can formulate your problem in terms of vectors or matrices. That way you would gain in both speed and precision!

    It should work perfectly the first time! - toma

Re: Determining when Math::BigFloat is necessary?
by jobber (Sexton) on Feb 03, 2003 at 18:56 UTC
    You could always write a regular expression and see how many digits the number has and then base your answer on the return for the regexp.
    if ($digit =~ /\d{6}/)
    {
    print "use bigfloat\n";
    }
      But how do you determine how many digits is too many on the current platform?
Re: Determining when Math::BigFloat is necessary?
by jepri (Parson) on Feb 04, 2003 at 01:41 UTC
    Just out of curiousity, what measurements are too big for perl? GPS readings to meter accuracy can be stored in native data types, so what kind of gear is getting you more accuracy?

    ____________________
    Jeremy
    I didn't believe in evil until I dated it.

      I haven't had problems with latitude/longitude/altitude, it's the elusive 'measurement' that shapefiles can contain that have been causing grief. The measurement value can be pretty much anything, if the lat/lon of the point represents a point in a river, the measurement might indicate flow rate, water temperature, depth, salinity, radioactivity, whatever it is that you need to measure for your application.
Re: Determining when Math::BigFloat is necessary?
by bart (Canon) on Feb 05, 2003 at 01:57 UTC