Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi there. I have a probably relatively simple task to undertake and I was wondering what the best way to tackle it might be.
I have a text file that looks something like this.
123.7 456.7 564.7 234.9
etc
I want to be able to bin this data .. for example
123-124 456-457 3
So, if the value of x is between 123 and 124 and value of y is between 456 and 457 increment the count by so. So, in in this example case, there would be three such incidences in the text file.
I thought about tackling this using a couple of for loops but not quite sure how to proceed. Any general hints or suggestions would be much appreciated.

Replies are listed 'Best First'.
Re: placing values into bins
by tlm (Prior) on Apr 28, 2005 at 13:13 UTC

    If am doing histograms, I prefer to use arrays for binning. The general approach goes like this. First, in 1-d:

    $histo[ ( $x - $x_min )/$x_bin_width ]++;
    For example, if your $x_min == 100, and your $x_bin_width == 5, then for $x == 123.45 the line above would add one more count to $histo[4]. Note that there's an implicit int() around the contents of the []; the line above is equivalent to the slightly longer:
    $histo[ int( ( $x - $x_min )/$x_bin_width ) ]++;
    Also note that when the point lands at a boundary between bins, this scheme assigns it to the bin on the right. E.g. using the same parameters as before if $x is exactly 125, the code above would add 1 to $histo[5], not to $histo[4].

    Now, for the 2-d case, it's basically the same idea:

    $histo[ ( $x - $x_min )/$x_bin_width ][ ( $y - $y_min )/$y_bin_width ] +++;

    the lowliest monk

Re: placing values into bins
by Transient (Hermit) on Apr 28, 2005 at 12:11 UTC
    You could use a hash of hashes and (if your range is fixed) - use keys of either the high or low values.

    For instance, if you were going to go low values, your structure would look like $hash->{123}->{456} for the example.

    my $range = 1; my $hash = {}; while (<>) { chomp; my ($num1, $num2) = split ' ',$_; # key the numbers my $int_num1 = int( $num1 ); # drop the decimal # this will create the correct key for an integer range # e.g. if you had a range of 5, this would result in # 120 being the first bucket my $key1 = $int_num1 - ($int_num1 % $range); # combined steps for key2 my $key2 = int($num2) - (int($num2) % $range); $hash->{$key1}->{$key2}++; } # now you can get your indices foreach my $range1 ( keys %$hash ) { foreach my $range2 ( keys %{$hash->{$range1}} ) { print $range1."-".($range1+$range); print " "; print $range2."-".($range2+$range); print " "; print $hash->{$range1}->{$range2}; print "\n"; } }
    This is untested, but I think (hope) it gives you at least an idea of one way to do it. Of course you can make it more efficient, shorter, etc.
Re: placing values into bins
by jdporter (Paladin) on Apr 28, 2005 at 12:54 UTC
    The numbers you give in your example suggest that you may be able to make some simplifying assumptions. In particular, the (one!) example of a bin range has ranges for x and y which are exactly one whole number wide. If you can say that that is true for all bins, then it is possible to calculate directly which bin a datum should go in: simply apply int() to x and to y. This assumption also makes it reasonable to use arrays rather than hashes to store the bins. So:
    my @data = ( [ 123.7, 456.7 ], [ 564.7, 234.9 ], ); for my $datum ( @data ) { my( $x, $y ) = @$_; $bins[ int $x ][ int $y ]++; } for my $x ( 0 .. $#bins ) { defined $bins[$x] or next; for my $y ( 0 .. $#{$bins[$x]} ) { defined $bins[$x][$y] or next; print "$x $y $bins[$x][$y]\n"; } }
      Hi there. Thanks for your replies so far. Much appreciated.
      But what if I wanted to choose 125.5-130.0 and 456.5-457.0 as a range for example? How simple would that be to do?

        If you follow the approach I sketched out in my other reply, all you need to do is pick the "left ends" of the ranges (in this case 125.5 and 456.5), and the desired bin widths; perl takes care of the right ends of the ranges depending on the actual data.

        the lowliest monk

        if you want to use the int method, but decide you want to use different sized bins, my simple solution would to be to create a new sub that returned the left end of the range, and just replace 'int' with that sub.
Re: placing values into bins
by pboin (Deacon) on Apr 28, 2005 at 12:16 UTC

    There's definitely more than one way to do this, but I thought I'd KISS, and combine the two values into one hash key instead of using two hashes. That also simplifies the display. It looks like this:

    #!/usr/bin/perl -w use strict; my ($x, $y); my %hash; my $key; while (<DATA>) { /(\w.*)\ (\w.*)/; $key = int($1) . ' ' . int($2); $hash{$key}++; } foreach my $item (sort( keys(%hash))) { print $item . ': ' . $hash{$item} . "\n"; } __DATA__ 123.7 456.7 564.7 234.9 123.7 456.7 564.7 234.9 654.9 132.7 518.0 025.3
Re: placing values into bins
by Joost (Canon) on Apr 28, 2005 at 12:15 UTC
    So, if the value of x is between 123 and 124 and value of y is between 456 and 457 increment the count by so.
    Where do these constraints come from? What do you want to do if the values are exactly 123 and 457? Do you mean something like this?
    my $count = 0; while(<STDIN>) { chomp; my ($x,$y) = split; if ($x > 123 and $x < 124 and $y > 456 and $y < 457) { $count++; } } print "123-124 456-457 $count\n";
Re: placing values into bins
by Anonymous Monk on Apr 28, 2005 at 13:57 UTC
    Would it be sensible to do something like create a hash like follows
    %xy
    where .. in a loop of some sort I could create the ranges, for example:
    for($x = 0; $x = $max; $x++) { for($y = 0; $y = $max; $y++) { $key1 = $x; $key2 = $y; } }
    Then compare my input data with the predefined ranges and increment a counter of some kind?
Re: placing values into bins
by Anonymous Monk on Apr 28, 2005 at 13:44 UTC
    Hi again.
    I get the feeling that I haven't quite explained what I need correctly. The ranges themselves have to be predefined if thats the right word. So, for example, I might need to see if any of my values are in the range
    x = 120-120.5 and y = 134.5 135.0
    And so on. Does that make sense?
    Thanks again
Re: placing values into bins
by Anonymous Monk on Apr 28, 2005 at 12:46 UTC
    Thank you very much. You have all been very helpful :)