Re: Perl Filehandle?

Are you sure the slowdown is occurring at the filehandle close and not somewhere else in the program?

Do all those arrays need to remain in memory for the duration of the program?
Not knowing what your functions do makes it hard to offer suggestions.

Have you tried moving patch_stockfiles/delete_duplicates/fix_format/output inside the initial input loop and eliminating the array entirely?
Based on those subroutine names, it seems as if at least some of that can be done from inside the loop.
As is, it seems as if you're iterating over that huge array 4 separate times after creating it.

My suggestion would be to find a way to process this data piecewise (or in tandem), rather than as a whole. Is this possible?

Comment on Re: Perl Filehandle?

Replies are listed 'Best First'.
Re^2: Perl Filehandle? by Smersh2000 (Initiate) on Nov 14, 2006 at 20:47 UTC
Thank you for reply, Originally, that is what i had in my mind - work on the files without loading the whole file into memory, butthen i though - given that i need to parse it 1 time for patching, 2 times for deleting duplicates and once for fixing the format, wouldn't i lose time on opening/closing files? I think i even tried to do this once in another script, and opening file alone would take some time. Was i wrong?	[reply]
Re^3: Perl Filehandle? by wojtyk (Friar) on Nov 15, 2006 at 00:17 UTC
File operations aren't always as white and black as they seem. The operating system does alot of file caching behind the scenes, so you probably won't take as bad a performance hit as you'd think if you open and close the same file several times. Have you tried Tie::File yet? It also does caching and deferred writes and other types of optimization. More importantly, you can give an upper limit on the amount of memory you want Tie::File to consume, which could possibly prevent excessive swapping. However, that discussion aside...my main point (which I think you might have missed) was that I don't think you *have* to parse it 4 times. I could be wrong (as I don't know all the facts), but can't any of this be done in tandem? Ie, why can't the format fix be done at the same time as the patch? You would be patching and formatting lines (not arrays) of data on the fly. You only need a single iteration of all that data, instead of several. You could probably even do the dup checking at the same time. Just build a hash of "things seen" as you're patching/formating, and skip any dups that appear in the hash. Pseudo-code for what I'm talking about: `while( $line = <INFILEHANDLE> ) { chomp($line); $seen{$line} = 1; # for dup checking if ($seen{$line}) { next; } # for dup skipping/deleting patch_line($line); format_line($line); # ... any other code ... print OUTFILEHANDLE $line; }` [download]	[reply] [d/l]