Re: placing values into bins
by tlm (Prior) on Apr 28, 2005 at 13:13 UTC
|
If am doing histograms, I prefer to use arrays for binning. The general approach goes like this. First, in 1-d:
$histo[ ( $x - $x_min )/$x_bin_width ]++;
For example, if your $x_min == 100, and your $x_bin_width == 5, then for $x == 123.45 the line above would add one more count to $histo[4]. Note that there's an implicit int() around the contents of the []; the line above is equivalent to the slightly longer:
$histo[ int( ( $x - $x_min )/$x_bin_width ) ]++;
Also note that when the point lands at a boundary between bins, this scheme assigns it to the bin on the right. E.g. using the same parameters as before if $x is exactly 125, the code above would add 1 to $histo[5], not to $histo[4].
Now, for the 2-d case, it's basically the same idea:
$histo[ ( $x - $x_min )/$x_bin_width ][ ( $y - $y_min )/$y_bin_width ]
+++;
| [reply] [d/l] [select] |
Re: placing values into bins
by Transient (Hermit) on Apr 28, 2005 at 12:11 UTC
|
You could use a hash of hashes and (if your range is fixed) - use keys of either the high or low values.
For instance, if you were going to go low values, your structure would look like $hash->{123}->{456} for the example.
my $range = 1;
my $hash = {};
while (<>) {
chomp;
my ($num1, $num2) = split ' ',$_; # key the numbers
my $int_num1 = int( $num1 ); # drop the decimal
# this will create the correct key for an integer range
# e.g. if you had a range of 5, this would result in
# 120 being the first bucket
my $key1 = $int_num1 - ($int_num1 % $range);
# combined steps for key2
my $key2 = int($num2) - (int($num2) % $range);
$hash->{$key1}->{$key2}++;
}
# now you can get your indices
foreach my $range1 ( keys %$hash ) {
foreach my $range2 ( keys %{$hash->{$range1}} ) {
print $range1."-".($range1+$range);
print " ";
print $range2."-".($range2+$range);
print " ";
print $hash->{$range1}->{$range2};
print "\n";
}
}
This is untested, but I think (hope) it gives you at least an idea of one way to do it. Of course you can make it more efficient, shorter, etc. | [reply] [d/l] |
Re: placing values into bins
by jdporter (Paladin) on Apr 28, 2005 at 12:54 UTC
|
The numbers you give in your example suggest that you may be able to make some simplifying assumptions.
In particular, the (one!) example of a bin range has ranges for x and y which are exactly one whole number wide. If you can say that that is true for all bins, then it is possible to calculate directly which bin a datum should go in: simply apply int() to x and to y. This assumption also makes it reasonable to use arrays rather than hashes to store the bins. So:
my @data = (
[ 123.7, 456.7 ],
[ 564.7, 234.9 ],
);
for my $datum ( @data ) {
my( $x, $y ) = @$_;
$bins[ int $x ][ int $y ]++;
}
for my $x ( 0 .. $#bins ) {
defined $bins[$x] or next;
for my $y ( 0 .. $#{$bins[$x]} ) {
defined $bins[$x][$y] or next;
print "$x $y $bins[$x][$y]\n";
}
}
| [reply] [d/l] [select] |
|
|
Hi there. Thanks for your replies so far. Much appreciated.
But what if I wanted to choose 125.5-130.0 and 456.5-457.0 as a range for example? How simple would that be to do?
| [reply] |
|
|
If you follow the approach I sketched out in my other reply, all you need to do is pick the "left ends" of the ranges (in this case 125.5 and 456.5), and the desired bin widths; perl takes care of the right ends of the ranges depending on the actual data.
| [reply] |
|
|
if you want to use the int method, but decide you want to use different sized bins, my simple solution would to be to create a new sub that returned the left end of the range, and just replace 'int' with that sub.
| [reply] |
Re: placing values into bins
by pboin (Deacon) on Apr 28, 2005 at 12:16 UTC
|
There's definitely more than one way to do this, but I thought I'd KISS, and combine the two values into one hash key instead of using two hashes. That also simplifies the display. It looks like this:
#!/usr/bin/perl -w
use strict;
my ($x, $y);
my %hash;
my $key;
while (<DATA>) {
/(\w.*)\ (\w.*)/;
$key = int($1) . ' ' . int($2);
$hash{$key}++;
}
foreach my $item (sort( keys(%hash))) {
print $item . ': ' . $hash{$item} . "\n";
}
__DATA__
123.7 456.7
564.7 234.9
123.7 456.7
564.7 234.9
654.9 132.7
518.0 025.3
| [reply] [d/l] |
Re: placing values into bins
by Joost (Canon) on Apr 28, 2005 at 12:15 UTC
|
So, if the value of x is between 123 and 124 and value of y is between 456 and 457 increment the count by so.
Where do these constraints come from? What do you want to do if the values are exactly 123 and 457? Do you mean something like this?
my $count = 0;
while(<STDIN>) {
chomp;
my ($x,$y) = split;
if ($x > 123 and $x < 124 and $y > 456 and $y < 457) {
$count++;
}
}
print "123-124 456-457 $count\n";
| [reply] [d/l] |
Re: placing values into bins
by Anonymous Monk on Apr 28, 2005 at 13:57 UTC
|
Would it be sensible to do something like create a hash like follows
%xy
where .. in a loop of some sort I could create the ranges, for example:
for($x = 0; $x = $max; $x++)
{
for($y = 0; $y = $max; $y++)
{
$key1 = $x;
$key2 = $y;
}
}
Then compare my input data with the predefined ranges and increment a counter of some kind? | [reply] [d/l] [select] |
Re: placing values into bins
by Anonymous Monk on Apr 28, 2005 at 13:44 UTC
|
Hi again.
I get the feeling that I haven't quite explained what I need correctly. The ranges themselves have to be predefined if thats the right word. So, for example, I might need to see if any of my values are in the range
x = 120-120.5 and y = 134.5 135.0
And so on. Does that make sense?
Thanks again | [reply] |
Re: placing values into bins
by Anonymous Monk on Apr 28, 2005 at 12:46 UTC
|
Thank you very much. You have all been very helpful :) | [reply] |