in reply to arrays : splice or undef ?

The best approach would be to leverage the benefits of Lexical Scoping. ...a brief example...

for( 1 .. 100 ) { my @array; my $c = 0; while( $c++ < 500_000 ) { push @array, rand; } print "\@array holds ", scalar( @array ), " elements.\n"; }

On each iteration of the outer "foreach" loop, @array is declared, filled, checked for an element count, and then falls out of scope, at which time the memory is released back to Perl for the next iteration.

If you watch Perl's memory usage during the run of this script you will see that after the first iteration of the 'foreach' loop, Perl never requires any significant additional memory.

The other thing to consider is your algorithm itself. Do you need the entire file to be slurped into an array? Or can you iterate over it line by line and process each line individually? The latter will almost always be more memory efficient. And finally, even if you do slurp the entire file into an array, each time you slurp it again into the same array, the previous contents are discarded and that memory becomes available again to Perl. Nevertheless, careful use of lexical scoping solves a whole slew of potential problems, memory usage being only one of them.


Dave

Replies are listed 'Best First'.
Re^2: arrays : splice or undef ?
by JockoHelios (Scribe) on Jun 04, 2013 at 18:18 UTC
    Though I haven't tested it, I assumed from the start that line-by-line processing would slow script execution due to the number of lines. The largest file I've run so far is 231 MB, with over 3.7 million lines.
    After the file is loaded, I do iterate each line individually. Meaning, subroutines with more arrays. The additional processing arrays are of course subsets of the file array, and another area in which I'm trying to avoid excessive disk I/O ( paging on Windows ).
    Dyslexics Untie !!!

      On my system it takes about 63/100ths of a second to read line by line through a file of 3.5 million lines that is 272 megabytes in size (that's the closest to 231MB and 2.7M lines that I happened to have laying around). That's with a no-op loop; whatever you do to process the lines of the file will consume time too, but they will consume virtually the same time whether you're iterating over lines from a file, or the elements of an array.

      If performance is an issue, profile.


      Dave