in reply to (dws)Re: Search Engines for Dummies
in thread Search Engines for Dummies

Assuming a 400Kb file is read in 8K chunks (by underlying OS buffering), that's 25 reads on average; 50 worst case.

I'll also assuming that you are reading the file from the begining. Why not try doing a binary search and start in the middle? I'm not sure of big O notation for this, but it would reduce the number of searches if you stick with the flat file.

update: You could also divide up the file by frequency letters starting words in the language of choice (english?). Or, start the flat file with a listing of how many words start with a, b,.. then compare to search word and read x lines into file and maybe do a binary search if there are a lot of words that start with a particular letter.
  • Comment on Re: (dws)Re: Search Engines for Dummies

Replies are listed 'Best First'.
Re: Re: (dws)Re: Search Engines for Dummies
by dws (Chancellor) on Feb 26, 2001 at 21:57 UTC
    You could have the indexer remember the starting position of each letter in the alphabet, and emit a separate require-ready file of these positions. That would indeed cut the search down. Or you could sort the index by word length, and remember those starting positions.

    Note that in either case, you're building an index, and have set one foot on the slippery slope towards a using DBM or SQL. Why not go all the way?

      I think that I would try using the a db just for the exercise, but I was going under the impression that the size of his file was small enough that implementing a db would have more overhead, and might not be worth the effort to learn a db module.