Despite what I just said about NP-completeness, the following algorithm
might give a reasonable solution (certainly not optimal).
- Let C be the set of all complaints.
- For each complaint c in C find the set S(c) of all complaints d in C
such that the distance between c and d is less than X (X is user defined).
- Find complaint e in C such that |S(e)| <= |S(f)| for all f in C;
that is, find the complaint who has the most other complaints nearby.
Pick a random one in case of a tie.
- Make a clump out of e and S(e).
- Remove all complaints g in S(e) from C. For all h remaining in C,
remove from S(h) all g in S(e).
- If C is empty, we're done. Else, goto 3.
Some pseudo code:
# Get set of all complaints.
my @C = get_all_complaints;
# Find all the associated sets.
my %D = map {my $c = $_;
$c => {map {$_ => 1}
grep {$c ne $_ && distance ($c, $_) < $X} @C}} @C;
while (%D) {
# Find complaint with the most nearby.
my ($complaint, $size) = (undef, 0);
while (my ($c, $set) = each %D) {
($complaint = $c, $size = keys %$set) if keys %$set > $size;
}
# Found largest, make a clump.
make_clump ($complaint, @{$D {$complaint}});
# Delete largest from set.
my $set = delete $D {$complaint};
# Delete associated set from set.
delete @D {keys %$set};
# Delete associated set from associated sets.
delete @{$_} {keys %$set} for values %D;
}
The performance will be quadratic, I'm afraid.
Abigail
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.