P is for Practical PerlMonks

### Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y

by vroom (His Eminence)
 on Jul 18, 2002 at 12:37 UTC ( #182777=note: print w/replies, xml ) Need Help??

I would say the most likely definition would be that all items within a clump are at most X feet from any other item in the clump. X would probably have a default of 300 feet or so but could be user-definable.
• Comment on Re: Re: Sorting by geographical proximity / clumping groups of items based on X and Y

Replies are listed 'Best First'.
Re: Sorting by geographical proximity / clumping groups of items based on X and Y
by Abigail-II (Bishop) on Jul 18, 2002 at 13:10 UTC
Despite what I just said about NP-completeness, the following algorithm might give a reasonable solution (certainly not optimal).
1. Let C be the set of all complaints.
2. For each complaint c in C find the set S(c) of all complaints d in C such that the distance between c and d is less than X (X is user defined).
3. Find complaint e in C such that |S(e)| <= |S(f)| for all f in C; that is, find the complaint who has the most other complaints nearby. Pick a random one in case of a tie.
4. Make a clump out of e and S(e).
5. Remove all complaints g in S(e) from C. For all h remaining in C, remove from S(h) all g in S(e).
6. If C is empty, we're done. Else, goto 3.
Some pseudo code:
```# Get set of all complaints.
my @C = get_all_complaints;
# Find all the associated sets.
my %D = map {my \$c = \$_;
\$c => {map  {\$_ => 1}
grep {\$c ne \$_ && distance (\$c, \$_) < \$X} @C}} @C;
while (%D) {
# Find complaint with the most nearby.
my (\$complaint, \$size) = (undef, 0);
while (my (\$c, \$set) = each %D) {
(\$complaint = \$c, \$size = keys %\$set) if keys %\$set > \$size;
}
# Found largest, make a clump.
make_clump (\$complaint, @{\$D {\$complaint}});
# Delete largest from set.
my \$set = delete \$D {\$complaint};
# Delete associated set from set.
delete @D {keys %\$set};
# Delete associated set from associated sets.
delete @{\$_} {keys %\$set} for values %D;
}
The performance will be quadratic, I'm afraid.

Abigail

Shouldn't step 3 in your algorithm description:
|S(e)| <= |S(f)|
|S(e)| > |S(f)|
? (Not having been a Math major, of course, could lead me to misinterpret "|S(t)|", which here I'm interpreting as "the size of set t".)
Anyway, I had intended to suggest an algorithm dealing with sorting based on the lengths of line segments, but I now realize that's probably not much use in this case (since vroom wants to find the clumps, not necessarily a point). And reading further down the page, I like Ntromda's idea, but I, like him/her, wouldn't know how to begin coding it (without thinking about it for a long while...).
Re: Sorting by geographical proximity / clumping groups of items based on X and Y
by Abigail-II (Bishop) on Jul 18, 2002 at 12:46 UTC
But you need more than that, otherwise the "Put each complaint in its own clump" is still a solution. It's tempting to add the restriction that you want to minimize the number of clumps, but that smells very much to an NP-complete covering problem.

Abigail

Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://182777]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others avoiding work at the Monastery: (3)
As of 2022-12-08 02:06 GMT
Sections?
Information?
Find Nodes?
Leftovers?
Voting Booth?

No recent polls found

Notices?