in reply to Matching non-words

Frequency analysis seems like the best possibility. Supposing you can normalize subjects, how about using regexes to build heuristics?
$subject =~ tr/a-zA-Z0-9_/ /sc; my @words = split(' ', $subject); my $regex = join(' ', map { "($_)?" } @words); my $num_matches = () = $potential_match =~ $regex; if ($num_matches == (@words - 1)) { register($words[-1]); }
If the first scalar @words - 1 tokens match, there's a good possibility the last piece is unique.

Food for thought.