in reply to Best way to look-up a string amongst 100 million of its peers

Of course the solution strongly depends on your exact problem (how long are the words? how often do they change? how much time can you invest in building an index? how often do you run queries?).

One thing you could do is a sorted file of fixed length records in which you perform a binary search.

You can use a hash to cache the most often used search terms, and a index for the first 2**16 (or whatever) positions in the file that a binary search would visit.

Or you just use a database engine in the hope that it implements the index very efficiently.

(I seem to recall that we had a similar question recently, I'll see if I can find the link). Update: here it is: fast lookups in files. It has some very good answers and is worth a read.

  • Comment on Re: Best way to look-up a string amongst 100 million of its peers