Re: Matching non-words

Frequency analysis seems like the best possibility. Supposing you can normalize subjects, how about using regexes to build heuristics?

$subject =~ tr/a-zA-Z0-9_/ /sc;
my @words = split(' ', $subject);
my $regex = join(' ', map { "($_)?" } @words);
my $num_matches = () = $potential_match =~ $regex;

if ($num_matches == (@words - 1)) {
    register($words[-1]);
}
[download]

If the first scalar @words - 1 tokens match, there's a good possibility the last piece is unique.

Food for thought.

Comment on Re: Matching non-words Select or Download Code