Re: Sorting by geographical proximity / clumping groups of items based on X and Y

in reply to Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y
in thread Sorting by geographical proximity / clumping groups of items based on X and Y

Despite what I just said about NP-completeness, the following algorithm might give a reasonable solution (certainly not optimal).

Let C be the set of all complaints.
For each complaint c in C find the set S(c) of all complaints d in C such that the distance between c and d is less than X (X is user defined).
Find complaint e in C such that |S(e)| <= |S(f)| for all f in C; that is, find the complaint who has the most other complaints nearby. Pick a random one in case of a tie.
Make a clump out of e and S(e).
Remove all complaints g in S(e) from C. For all h remaining in C, remove from S(h) all g in S(e).
If C is empty, we're done. Else, goto 3.

Some pseudo code:

# Get set of all complaints.
my @C = get_all_complaints;
# Find all the associated sets.
my %D = map {my $c = $_;
             $c => {map  {$_ => 1}
                    grep {$c ne $_ && distance ($c, $_) < $X} @C}} @C;
while (%D) {
    # Find complaint with the most nearby.
    my ($complaint, $size) = (undef, 0);
    while (my ($c, $set) = each %D) {
        ($complaint = $c, $size = keys %$set) if keys %$set > $size;
    }
    # Found largest, make a clump.
    make_clump ($complaint, @{$D {$complaint}});
    # Delete largest from set.
    my $set = delete $D {$complaint};
    # Delete associated set from set.
    delete @D {keys %$set};
    # Delete associated set from associated sets.
    delete @{$_} {keys %$set} for values %D;
}
[download]

The performance will be quadratic, I'm afraid.

Abigail

Comment on Re: Sorting by geographical proximity / clumping groups of items based on X and Y Download Code

Replies are listed 'Best First'.
Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y by t'mo (Pilgrim) on Jul 18, 2002 at 16:09 UTC
Shouldn't step 3 in your algorithm description: \|S(e)\| <= \|S(f)\| read \|S(e)\| > \|S(f)\| ? (Not having been a Math major, of course, could lead me to misinterpret "\|S(t)\|", which here I'm interpreting as "the size of set t".) Anyway, I had intended to suggest an algorithm dealing with sorting based on the lengths of line segments, but I now realize that's probably not much use in this case (since vroom wants to find the clumps, not necessarily a point). And reading further down the page, I like Ntromda's idea, but I, like him/her, wouldn't know how to begin coding it (without thinking about it for a long while...).	[reply]

Replies are listed 'Best First'.

Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y
by t'mo (Pilgrim) on Jul 18, 2002 at 16:09 UTC

|S(e)| <= |S(f)|

|S(e)| > |S(f)|

[reply]

In Section Seekers of Perl Wisdom