Beefy Boxes and Bandwidth Generously Provided by pair Networks
good chemistry is complicated,
and a little bit messy -LW
 
PerlMonks  

Re: Sorting by geographical proximity / clumping groups of items based on X and Y

by DamnDirtyApe (Curate)
on Jul 17, 2002 at 23:22 UTC ( #182638=note: print w/replies, xml ) Need Help??


in reply to Sorting by geographical proximity / clumping groups of items based on X and Y

Here's a quick example that will hopefully get you started.

#! /usr/bin/perl use strict ; use warnings ; my @complaints = ( { 'complaint_id' => 'a123', 'x' => '45.2', 'y' => '39.7' }, { 'complaint_id' => 'b456', 'x' => '79.3', 'y' => '42.0' }, { 'complaint_id' => 'c789', 'x' => '11.9', 'y' => '29.8' }, { 'complaint_id' => 'd863', 'x' => '95.3', 'y' => '17.2' }, { 'complaint_id' => 'e635', 'x' => '65.5', 'y' => '33.3' }, ) ; my ( $curr_x, $curr_y ) = ( 36.2, 47.3 ) ; my @sorted_by_proximity = map { $_->[1]->{'dist'} = $_->[0]; $_->[1] } sort { $a->[0] <=> $b->[0] } map { [ distance( $curr_x, $curr_y, $_->{'x'}, $_->{'y'} ), $_ ] } @complaints ; foreach ( @sorted_by_proximity ) { print $_->{'complaint_id'} . " is " . $_->{'dist'} . " away.\n" ; } sub distance { my ($x1, $y1, $x2, $y2 ) = @_ ; return sqrt( abs( $x2 - $x1 )**2 + abs( $y2 - $y1 )**2 ) ; }

This gives you an array ref of complaints, sorted closest-first. That way, you could take the first n complaints, or all the complaints until the distance is greater than x, etc.

Update: Thanks to dws and hossman both correct. Here's mv revised code which may be a little more useful.

#! /usr/bin/perl use strict ; use warnings ; # Define some rules and constants. my $CLUSTER_RADIUS = 10 ; # Max distance to be included in cluster. my $CENTER_SKIP = 5 ; # Distance between centers on cluster scans +. my $CLUSTER_COUNT = 4 ; # Minimum no. of complaints to make a clust +er. my $START_X = 0 ; my $END_X = 100 ; my $START_Y = 0 ; my $END_Y = 100 ; # Load the complaints. my @complaints = () ; while ( <DATA> ) { next if /^$/ ; my $hashref = {} ; ( $hashref->{'id'}, $hashref->{'x'}, $hashref->{'y'} ) = split ; push @complaints, $hashref ; } for ( my $curr_x = $START_X ; $curr_x <= $END_X ; $curr_x += $CENTER_SKIP ) { for ( my $curr_y = $START_Y ; $curr_y <= $END_Y ; $curr_y += $CENTER_SKIP ) { my @near = map { $_->[1]->{'dist'} = $_->[0] ; $_->[1] } grep { $_->[0] <= $CLUSTER_RADIUS } map { [ distance( $curr_x, $curr_y, $_->{'x'}, $_->{'y'} ), $_ ] } @complaints ; my $complaint_count = @near ; if ( $complaint_count >= $CLUSTER_COUNT ) { printf "%d complaints near (%d,%d).\n", $complaint_count, $curr_x, $curr_y ; } } } exit 0 ; #------------------------------------------------------- sub distance { my ($x1, $y1, $x2, $y2 ) = @_ ; return sqrt( ( $x2 - $x1 )**2 + ( $y2 - $y1 )**2 ) ; } __DATA__ a123 10.1 12.5 b124 12.5 8.6 c125 9.8 10.2 d213 24.6 19.5 e753 35.6 2.2 f854 76.5 46.7 g354 8.7 8.6

_______________
D a m n D i r t y A p e
Home Node | Email

Replies are listed 'Best First'.
Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y
by dws (Chancellor) on Jul 17, 2002 at 23:31 UTC
      return sqrt( abs( $x2 - $x1 )**2 + abs( $y2 - $y1 )**2 ) ; The abs() isn't necessary, since the result of squaring the difference will always be positive.

    For purposes of ordering pairs by distance, it is sufficient to sum the squared differences and sort on distance^2, deferring the sqrt() until a true distance is necessary. (I picked up this trick from Jon Bentley's "Writing Efficient Programs," which is out of print, but worth grabbing if you find a copy in a used bookstore.)

    You can also exclude from consideration, before sorting, any pair that has a difference in either dimension that is > X. Such pairs will be excluded anyway, so why clog up the sort with them.

Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y
by hossman (Prior) on Jul 18, 2002 at 00:08 UTC
    that assumes a $curr_x and a $curr_y ... my understanding of the question was to find clusters of complaints -- not just the problems near a given location.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://182638]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (4)
As of 2022-12-08 03:44 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?