in reply to Comparing strings (exact matches) in LARGE numbers FAST

Create a prefix tree for the dictionary that you're trying to match against. Also, add a data element to each node in the tree to mark whether or not it is a valid terminal position.

Then, for each search instance (i.e., each line in the second input file) search the prefix tree. If you end up on a node in your prefix tree that is also a valid terminal position then you have a match. Creating the prefix tree shall have an amortized linear time; searching the prefix tree for each element in a file is bounded in the worst case as taking n*(k/log(base)) time where n is the number of elements in the file and k is the length of the longest string that you're matching against and where base is the number of children each node has.

This approach has the added benefit that once the prefix tree is created it can be reused later. -- jbs36
  • Comment on Re: Comparing strings (exact matches) in LARGE numbers FAST