in reply to Re: (dws)Re: Search Engines for Dummies
in thread Search Engines for Dummies

I've already limited the index to words of four letters or more,

Well, there goes "sex" :)

Seriously, a four-or-more letter rule isn't very good. You risk dropping significant two or three letter terms (e.g., AI, XML), and while cluttering up the index with common words (e.g., were, which).

Try this simple experiment. Sort your index by the length of each line. Terms that appear in all or nearly all of the documents will rise to the top. Then look at the first 100 or so words. If they're not "significant" (and here you'll have to make the call on what's significant to your domain), then add them to a list of words that the indexer will ignore.

  • Comment on Re: Re: (dws)Re: Search Engines for Dummies