in reply to Best way to look-up a string amongst 100 million of its peers

For speed nothing will beat a hash, BUT keeping such big hash in memory seems difficult and of course loading the hash from disk will take its time as well.

If you need to check this list repeatedly, in the long run you will be better of with a database (nicely indexed of course!).

Now, one the other hand, 100 million words? I don't think there are that many words existing so you will have words being repeated and far less unique entries than 100 million, say perhaps only a few 100 thousand and then a hash will be a good idea.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

  • Comment on Re: Best way to look-up a string amongst 100 million of its peers