File operations aren't always as white and black as they seem. The operating system does alot of file caching behind the scenes, so you probably won't take as bad a performance hit as you'd think if you open and close the same file several times.
Have you tried Tie::File yet? It also does caching and deferred writes and other types of optimization. More importantly, you can give an upper limit on the amount of memory you want Tie::File to consume, which could possibly prevent excessive swapping.
However, that discussion aside...my main point (which I think you might have missed) was that I don't think you have to parse it 4 times. I could be wrong (as I don't know all the facts), but can't any of this be done in tandem? Ie, why can't the format fix be done at the same time as the patch?
You would be patching and formatting lines (not arrays) of data on the fly. You only need a single iteration of all that data, instead of several.
You could probably even do the dup checking at the same time. Just build a hash of "things seen" as you're patching/formating, and skip any dups that appear in the hash. Pseudo-code for what I'm talking about:
while( $line = <INFILEHANDLE> ) {
chomp($line);
$seen{$line} = 1; # for dup checking
if ($seen{$line}) { next; } # for dup skipping/deleting
patch_line($line);
format_line($line);
# ... any other code ...
print OUTFILEHANDLE $line;
}
|