in reply to Merging larges files by columns
As others said, some sample data would be helpful. But looking at your working-but-slow script, I see that you're looping completely through file2 for every line of file1. That's going to be brutal if file2 is very large. You could speed it up some by at least breaking out of your loop through file2 once you find your match.
Better would be to first read file2 into a hash, with the first field (the one you match your counter against) as the keys, and then check that hash for each line of file1. If file2 is so large that reading it into a hash would present memory problems, you could tie it to a DBM file, and that way the dbm library can put as much of it on disk as necessary.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Merging larges files by columns
by Marshall (Canon) on Sep 17, 2011 at 02:04 UTC | |
by aaron_baugher (Curate) on Sep 17, 2011 at 11:53 UTC | |
by Marshall (Canon) on Sep 19, 2011 at 00:59 UTC |