in reply to Best way to search file
If this is the case, then you will find that storing file2 in a hash before starting to process file1 will make the process incredibly faster. And the larger file2 is, the higher the speed gain.
As mentioned by sundialsvc4, the only limit to that is that if file2 is so big that the hash will take all the memory, then the hash is no longer a solution. (It depends on your system, but with today's typical RAM, my experience is that the limit could be somewhere between 5 and 15 million lines for file2.)
In that case, I would really recommend sorting the files and reading sequentially both files in parallel. This is in my experience with huge files way faster than using a database. The only downside with this approach is that the algorithm for reading 2 files in parallel can be a bit tricky, with quite a few edge cases to be taken care of.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Best way to search file
by insta.gator (Novice) on Apr 15, 2015 at 18:59 UTC | |
by Laurent_R (Canon) on Apr 15, 2015 at 20:37 UTC | |
by locked_user sundialsvc4 (Abbot) on Apr 15, 2015 at 22:57 UTC | |
by insta.gator (Novice) on Apr 16, 2015 at 18:19 UTC | |
by Marshall (Canon) on Apr 16, 2015 at 21:02 UTC |