in reply to Parsing a large file 80GB .gz file and making an output file with specifice columns in this original file.
I think Corion has the right idea, but I'd take it one step further and avoid having the perl script write directly to disk.
I'd do it this way:
gunzip -cd big.tsv.gz | perl -ne"@v=split chr(9),$_; $i=-1; print $v[$i+=3], chr(9) while $i + < @v; print chr(10)" | perl -pe1 > newfile
NB: That's a one-liner wrapped across 3 lines for posting.
The idea is to
By avoiding building a second array of the columns you are keeping and then more memory in order to do the join.
If you are going to be doing this regularly rather than just as a one-off, it might be worth trying both ways to see which works best on your system.
|
|---|