in reply to join on 6 huge files
If you are certain that the SHA entries are going to be in the same order in all six of the files then reading from each file, one entry at a time, would probably be best.
If the SHA entries in the aren't in the same order, then I would try to cat the files together, sort them with the sort tool, (which can probably sort 56 million lines much faster than perl), and then let a script read the sorted result.
Slurping shouldn't be necessary... just keep an array of hex strings for the current SHA number and dump it to the out file when you hit a new SHA.
This is just a SWAG, though, and I can't guarantee that sort won't get pathological on you with 56 million lines.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: join on 6 huge files
by ambrus (Abbot) on Jun 10, 2004 at 21:07 UTC | |
|
Re^2: join on 6 huge files
by pbeckingham (Parson) on Jun 10, 2004 at 13:46 UTC |