in reply to Simple Text Indexing

Couple of tiny comments on your code:

my @words = split /\s/, $line;

Might have problems if there are multiple spaces, spaces and tabs, so

my @words = split /\s+/, $line;

Would be better, surely? Is that what your removeNullEntries is about?

I ended up coming up with a complex regex to get what I thought were "words" out of text, something like

/\w[\w'-]*\w|\w+/
rather than just grabbing strings seperated by whitespace and trying to figure out if they're really valid words later.

And

my @stopList = ("the", "a", "an", "of", "and", "on", "in", "by", "with", "at", "he", "after", "into", "their", "is", "that", "they", "for", "to", "it", "them", "which");
Seems like it would be better off as a hash so you can just go if(defined($stoplist{$word})).


($_='kkvvttuubbooppuuiiffssqqffssmmiibbddllffss') =~y~b-v~a-z~s; print