in reply to Re: Removing common words
in thread Removing common words

from the list of banned words it seems that you're banning all one letter and two letter words and a handful of 3 letter words. why not just discard/skip ALL one letter and two letter words?

Replies are listed 'Best First'.
Re: Re: Re: Removing common words
by davido (Cardinal) on Apr 04, 2004 at 08:30 UTC
    That would be a good optimization if the list of "banned" words were fixed and immutable. Then program logic could deal with all one-letter and two-letter words, as well as single-digit numbers. But I stuck with the philosophy of explicitally naming the "banned" words so that the list could be maintained without diving into the program's logic. I was also thinking of the possibility that there could be a banned-word file, rather than using the __DATA__ filehandle.

    Good point though; if I hadn't been designing with maintainability and flexibility in the word list in mind, I would completely agree that there is a more efficient way to block all single-letter and double-letter words.


    Dave