Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

When < isn't less than

by inelukii (Sexton)
on Sep 11, 2003 at 18:45 UTC ( [id://290777]=perlquestion: print w/replies, xml ) Need Help??

inelukii has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have inherited some code that does what would appear to be a simple binning operation. However, I have discovered an error in the binning but for the life of me don't see what's wrong with the code.

Given a set of data between 0.9 and 1, the code should place it in 12 different bins.
Bin 0 is < 0.9,
Bin 1 >= 0.9 && < 0.91,
Bin 2 >= 0.91 && < 0.92,
etc..
Bin 11 >= 1.0

When run, some values get shifted down, others are placed in appropriate bins. So far, I am concerned with the < operator because it shows for example, that 0.99 is < 0.99. I hope there is a stupid mistake that I am just missing; I'd appreciate any assistance. Whether the binning algorithm is efficient or not, I'm not currently concerned with, first and foremost is that it work.

Here's the code...

#!/usr/bin/perl use strict; my %final_data = ( '1' => 1.000, '2' => 0.990, '3' => 0.980, '4' => 0.970, '5' => 0.960, '6' => 0.950, '7' => 0.940, '8' => 0.930, '9' => 0.920, '10' => 0.910, '11' => 0.900, '12' => 0.890, ); my $min = 0.90; my $max = 1.00; my $low; my $high; my $incr = 0.01; DATA_ITEM: for my $key ( sort { $a <=> $b } keys %final_data ) { $low = $min; $high = $min + $incr; for my $bin ( 1 .. 10 ) { if( $final_data{$key} < $min ) { warn "$final_data{$key} fell in bin 0 ( $final_data{$key} < $min +)\n"; $low = $high; $high += $incr; next DATA_ITEM; } elsif( $final_data{$key} >= $max ) { warn "$final_data{$key} fell in bin 11 ( $final_data{$key} >= $ma +x )\n"; $low = $high; $high += $incr; next DATA_ITEM; } elsif( ($final_data{$key} >= $low) && ($final_data{$key} < $high +) ) { warn "$final_data{$key} fell in bin $bin ( $final_data{$key} >= $ +low && $final_data{$key} < $high )\n"; $low = $high; $high += $incr; next DATA_ITEM; } $low = $high; $high += $incr; } }

Inelukii

Replies are listed 'Best First'.
Re: When < isn't less than
by dws (Chancellor) on Sep 11, 2003 at 19:07 UTC
    When run, some values get shifted down, others are placed in appropriate bins. So far, I am concerned with the < operator because it shows for example, that 0.99 is < 0.99.

    What it is actually showing is that

    (0.9 + 0.01 + 0.01 + ... + 0.01) < 0.99
    Welcome to the wild, wacky world of imprecise floating point representation. The problem, or one of them, is that the numbers you're using (other than 1.0) don't have precise counterparts in the internal floating point representation that chips use to represent real numbers. When you start adding them, the imprecision gets more noticeable.

    If you're concerned, you might be able to scale your data up by 100x, then scale back when its time to display the buckets.

When 0.99 isn't 0.99
by Thelonius (Priest) on Sep 11, 2003 at 19:16 UTC
    If you add these two lines:
    my $x = $final_data{$key}; printf "x = %.20f high= %.20f\n", $x, $high;
    you will get this output:
    x = 0.98999999999999999000 high= 0.99000000000000010000 x = 0.97999999999999998000 high= 0.98000000000000009000 x = 0.96999999999999997000 high= 0.97000000000000008000 x = 0.95999999999999996000 high= 0.96000000000000008000 x = 0.94999999999999996000 high= 0.95000000000000007000 x = 0.93999999999999995000 high= 0.94000000000000006000 x = 0.93000000000000005000 high= 0.94000000000000006000 x = 0.92000000000000004000 high= 0.93000000000000005000 x = 0.91000000000000003000 high= 0.92000000000000004000 x = 0.90000000000000002000 high= 0.91000000000000003000
    As you can see, the 0.99 that you get by assigning 0.99 is not the same as the 0.99 you get when you start with 0.90 and add 0.01 nine times. That's the way it is with binary floating point numbers. They're not exactly what you expect. There are several things you could do:
    1. Don't worry about it. If these are measurements, e.g., then ones that fall right on the boundaries of the bins are inherently ambiguous.
    2. Use an arbitrary-precision math package. There are ones for Perl, but I haven't used them, so I can't comment more.
    3. Scale everything so that they are all integers.
Re: When < isn't less than
by hossman (Prior) on Sep 11, 2003 at 19:30 UTC

    My favorite node on this topic is Still puzzled by floats.

    Particularly because I had just done a bunch of research on this prior to seeing that node (in an attept to educate many of my co-workers who didn't get it no matter how many times it bit them), and had it all fresh in my mind when I wrote my reply.

Re: When < isn't less than
by sutch (Curate) on Sep 11, 2003 at 19:12 UTC
    Your issue has to do with the way that numbers are represented by Perl. An article that is helpful in understanding this can be found at TPJ.
Re: When < isn't less than
by inelukii (Sexton) on Sep 11, 2003 at 21:29 UTC
    Thanks for the helpful responses. I'd printed %.6f and not seen any difference, thanks for the clarification.

    Inelukii

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://290777]
Approved by gjb
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (6)
As of 2024-03-29 10:04 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found