http://qs1969.pair.com?node_id=363554


in reply to join on 6 huge files

Dwite,

If I understand your problem correctly, and you were a bit hazy, you simply do not want to work with these huge files in memory. Right?

I would open all the files in a stream fashion, much in the spirit of that old Unix standy "sed". Go through each file line-by-line like you are executing a batch process. The output of each "one line" from each input stream is munged "per-line", not all at once. Then take the resulting concatenation and write to the destination file. It's quick, simple, and elegant.

I have written several tools that use a sed tack. Some of my files are as large as 12MB and it never chokes.