austinby has asked for the wisdom of the Perl Monks concerning the following question:

Hi perlmonks,

I am trying to bin the 2D data shown below in a grid. i.e. (x1,y1), (x1,y2),...(x1,yn), (x2,y1), (x2,y2),...(x2,yn), etc.
FYI this is not a school assignment, etc.

X Y
3.70722 0.54617
3.67342 0.64892
3.75017 0.85075
3.82831 0.83506
3.71852 0.79251
3.80706 0.70426
3.91839 0.94094
3.90726 0.81722
3.89007 0.76107
3.98006 0.62477

Here's how I am proceeding:
1). Create an array of x-values (@egvx)
2). Create an array of y-values (@egvy)
3). Get the minimum and maximum values of each array ($minx, $maxx, $miny, $maxy)
4). Create two ranges: @rgx = ($minx..$maxx) and @rgy = ($miny..$maxy) with regular spacings of 0.1.
5). For each value in @egvx, loop through all the values in @egvy and check where each pair of values lies by looping through bins defined by ($rgx[$x] and $rgx[$x] + 0.1), and ($rgy[$y] and $rgy[$y] + 0.1)
6). If a pair of @egvx and @egvy values falls between ($rgx[$x] and $rgx[$x] + 0.1), and ($rgy[$y] and $rgy[$y] + 0.1) increase the count of that bin.

Steps five and six are the problem areas. First, I have 196 bins, but after running the script, I get only 90 bins. Secondly, the bin counts are all zero.

It's a pretty long code. I have shown only that section where I try to handle steps 5 & 6.

my $j1x = 0; my $j1y = 0; my $k = 0; my $ctxy = 0; #create the number of bins while($j1x < $kx) { while($j1y < $ky) { $bin[$ctxy] = "bin\t$num"; $binCnt[$ctxy] = 0; $num++; $ctxy++; $j1y++; } $j1y = 0; $j1x++; } #@egvx = x-values, @rgx = range of x-values, @binx = x-bins #@egvy = y-values, @rgy = range of y-values, @biny = y-bins #@upy and @upx = upper-boundaries of yth and xth bins my $j = 0; my $x = 0; my $y = 0; my $c = 0; open(TST,">test.txt"); while($j < $i) #loops through the x-values { while($c < $i)#loops through the y-values { while($y < $ky)#loops through the y-bins { $upy[$y] = $rgy[$y] + 0.1; while($x < $kx)#loops through the x-bins { $upx[$x] = $rgx[$x] + 0.1; #check which bin the x-value falls into if(($egvx[$j] > $rgx[$x])and($egvx[$j] < $upx[$x])) { #print TST "$egvx[$j]\t$rgx[$x]\t$upx[$x]\n"; #record x-center of the bin $binx[$x] = ($rgx[$x] + $upx[$x])/2; #check which bin the y-value falls into if(($egvy[$c] > $rgy[$y])and($egvy[$c] < $upy[$y])) { #record y-center of the bin $biny[$y] = ($rgy[$y] + $upy[$y])/2; $binCnt[$x] += $binCnt[0]; print TST "$binCnt[$x]\n"; #print TST "x: $egvx[$j] $rgx[$x] $upx[$x]\t y: $egvy[$j]\ +t$rgy[$y]\t$upy[$y]\n"; } } $x++; } $x = 0; #reset x-bins $y++; } $y = 0; #reset y-bins $c++; } $c = 0; #reset y-values until all x-values have been tested $j++; }

Replies are listed 'Best First'.
Re: 2D binning
by BrowserUk (Patriarch) on Feb 18, 2010 at 23:58 UTC

    It's really quite hard to tell from either your description, or your code (was it translated from another language? FORTRAN?), what you are trying to do.

    My best guess is that you are trying to count the number of xy pairs that fall into each 0.1 interval in both the x & y dimensions. If so, your code is working far too hard. I think this might be a simpler approach:

    #! perl -slw use strict; use List::Util qw[ sum ]; use Data::Dump qw[ pp ]; our $X //= 10; our $Y //= 10; my @xs = map -3+rand 6, 1 .. $X; my @ys = map -3+rand 6, 1 .. $Y; my %bins; for my $x ( @xs ) { my $xc = ( int( $x * 10 ) + 0.5 ) / 10; for my $y ( @ys ) { my $yc = ( int( $y * 10 ) + 0.5 ) / 10; ++$bins{ "$xc:$yc" }; } } pp \%bins; printf "Buckets used: %d, total values %d \n", scalar keys %bins, sum values %bins;

    Output:


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: 2D binning
by spx2 (Deacon) on Feb 19, 2010 at 00:08 UTC
    much more efficient would be to divide the coordinates length by the bin side and round that off and concatenate those together and they form a pretty good hash key(and you don't need to compute minimum or maximum any more). but here's a solution following your line of thought:
    use strict; use warnings; use List::AllUtils qw/minmax/; my @points = ( [3.70722,0.54617,], [3.67342,0.64892,], [3.75017,0.85075,], [3.82831,0.83506,], [3.71852,0.79251,], [3.80706,0.70426,], [3.91839,0.94094,], [3.90726,0.81722,], [3.89007,0.76107,], [3.98006,0.62477,], ); my $bins; my $binside = 0.1; my ($xmin,$xmax) = minmax( map { $_->[0] } @points ); my ($ymin,$ymax) = minmax( map { $_->[1] } @points ); for( my $xbin=$xmin ; $xbin<=$xmax ; $xbin+= $binside) { for(my $ybin=$ymin ; $ybin<=$ymax ; $ybin+= $binside) { my $p = $points[0]; if( $p->[0] >= $xbin && $p->[0] < $xbin + $binside && $p->[1] >= $ybin && $p->[1] < $ybin + $binside ) { $bins->{$xbin.$ybin} = [] unless($bins->{$xbin.$ybin}); push @{ $bins->{$xbin.$ybin} } , (shift @points); }; } } print scalar(keys(%$bins)); 4
    this sounds a lot like the bin packing problem.
      Thanks a lot for your input monks,
      I'll test all your ideas and send updates.
      Austin-
        Hi perlmonks,
        I tested the code sent by spx2 and BrowserUk after installing the List::AllUtils and Data::Dump modules.
        I get "4" as output from spx2's code.
        #!/usr/bin/perl -slw use strict; use warnings; use List::AllUtils qw/minmax/; my @points = ( [3.70722,0.54617,], [3.67342,0.64892,], [3.75017,0.85075,], [3.82831,0.83506,], [3.71852,0.79251,], [3.80706,0.70426,], [3.91839,0.94094,], [3.90726,0.81722,], [3.89007,0.76107,], [3.98006,0.62477,], ); my $bins; my $binside = 0.1; my ($xmin,$xmax) = minmax( map { $_->[0] } @points ); my ($ymin,$ymax) = minmax( map { $_->[1] } @points ); for( my $xbin=$xmin ; $xbin<=$xmax ; $xbin+= $binside) { for(my $ybin=$ymin ; $ybin<=$ymax ; $ybin+= $binside) { my $p = $points[0]; if( $p->[0] >= $xbin && $p->[0] < $xbin + $binside && $p->[1] >= $ybin && $p->[1] < $ybin + $binside ) { $bins->{$xbin.$ybin} = [] unless($bins->{$xbin.$ybin}); push @{ $bins->{$xbin.$ybin} } , (shift @points); }; } } print scalar(keys(%$bins));
        Not sure what the "4" means.

        I get a "Global symbol "$Y" requires explicit package name at line " error with BrowserUk's code. When I change the positions of $Y and $X, I get a "Global symbol "$X" requires explicit package name at line "
        #!/usr/bin/perl -slw use strict; use List::Util qw[ sum ]; use Data::Dump qw[ pp ]; our $Y //= 10; our $X //= 10; my @xs = map -3+rand 6, 1 .. $X; my @ys = map -3+rand 6, 1 .. $Y; my %bins; for my $x ( @xs ) { my $xc = ( int( $x * 10 ) + 0.5 ) / 10; for my $y ( @ys ) { my $yc = ( int( $y * 10 ) + 0.5 ) / 10; ++$bins{ "$xc:$yc" }; } } pp \%bins; printf "Buckets used: %d, total values %d \n", scalar keys %bins, sum values %bins;
        There's a script online that does exactly what I'm trying to do (http://homepages.ulb.ac.be/~dgonze/SCRIPTS/PERL/histogram-2d.pl), but requires user-defined inputs. I'd like to do this such that all the user needs is the input file.
A reply falls below the community's threshold of quality. You may see it by logging in.