in reply to Large file efficiency

Since I am working on a cluster, I'm at the mercy of the network for file reads. That means that, usually, one large data read is faster than many many small reads. Furthermore, since I am pruning data files, I need to adjust the headers, and being rather new at this the best way seemed to be to slurp in the data, prune out what I don't want, adjust the headers, and write the data out. For files small enough to be kept within my ram (16G) this method is a factor of 10 faster than reading and writing one line at a time, then going back and adjusting my headers, which was yet another read. However, when the slurp sends me to swap, life slows way down.