Just a small note. Starting with perl 5.010, you can assemble the stop words into a alternating string separated by the pipe, '|', character and it will use a 'trie' to combine the stop words. It is claimed (by somebody?), the 'trie' approach is big-O 1, an algorithm in constant time (in most cases, they claim). Similiar to what Regexp::Assemble does.
I think I read somewhere that you shouldn't use Regexp::Assemble with perl version > 5.8 because it will muck things up.