in reply to Re: Searching Huge files
in thread Searching Huge files

You have to put one of the two files into a hash, it doesn't really matter which one.

Actually, there's a good chance that it does matter. If one file has about 2 million rows/keys, and the other has about 8 million, it will take noticeably less resources and time to store the keys of the smaller file into a hash. As GrandFather suggested above, there's a reasonable chance that a hash of 2 million elements could fit into RAM without causing the machine to flail due to the virtual memory content being bounced back and forth between RAM and swap file.

But whether it's in-memory or in a DBM file of some sort, creating 2 million keys will be quicker than 8 million (and it would just seem to make more sense). Of course, once a hash has been built, access time is not likely to differ all that much (except when an "in-memory" hash is big enough to induce swapping), but the time/space needed to build the hash may differ significantly depending on the quantity of elements involved.