Re: Finding similar data

It's easy to sort the words into sets if it's you're just matching the first 3 characters. See above. But what if the criteria are more complex? What if the words are considered a match if a block of letters inside the words comprising at least x% of the total letters matches? You can't use keys and hashes for that. Does every word in a set have to match every other word in the set, or can the words be linked by "joiner" words that may match two or more words that don't match each other? Both situations create interesting algorithmic problems. Probably the best thing to do is separate your match criteria from your set criteria by having match be a sub:

my $match = sub { return substr($_[0], 0, 3) eq substr($_[1], 0, 3); }
+;
print $match->('aaaa','aaab');
[download]

And then passing it to your set sub:

myset($match, @words);
sub myset {
    my $match = shift;
    # Do whatever set algorithm here,
    # using $match to compare words
}
[download]

This allows you to easily test multiple match criteria without messy duplication of code. I'd have included set code too, but you weren't very specific as to what you really want in terms of match / set criteria. First three letters won't get you much of anywhere.

Comment on Re: Finding similar data Select or Download Code