JockoHelios has asked for the wisdom of the Perl Monks concerning the following question:

I'm using strict and warnings for my code, working with data files which can run from 85MB to over 200MB each. After processing is complete on one file, I've been using

splice( @FileData );

to clear the array and release the RAM back to Perl before loading then next file. I understand that this doesn't release the RAM back to the OS; that's fine.

Would using

undef( @FileData );

be a better choice than splice, or just a different one ?

Using either method with a test script, Perl doesn't openly acknowledge the existence of the array after it's been undef'd or spliced. However, it apparently still "exists" because I don't have to redeclare it with

splice( my @FileData );
or
undef( my @FileData );

The commented splice is in the test script because I tried both methods of array declaration; I get the same output either way.
#!C:\Perl\bin use strict; use warnings; my $SampleLine = "Perl is addictive. There should be a warning label +or something\.\.\.\n"; my $Count = 0; #splice( my @TestArray ); undef( my @TestArray ); push( @TestArray, $SampleLine ); $Count = scalar @TestArray; print "\npush Count is $Count\n"; if ( @TestArray ) { print"\nPush - TestArray IS here\n\n"; } else { print"\nAfter Push - TestArray is NOT here\n\n"; } undef( @TestArray ); $Count = scalar @TestArray; print "\nundef Count is $Count\n"; if ( @TestArray ) { print"\nUnDef - TestArray IS here\n\n"; } else { print"\nAfter UnDef - TestArray is NOT here\n\n"; } push( @TestArray, $SampleLine ); $Count = scalar @TestArray; print "\npush Count is $Count\n"; if ( @TestArray ) { print"\nPush - TestArray IS here\n\n"; } else { print"\nAfter Push - TestArray is NOT here\n\n"; } splice( @TestArray ); $Count = scalar @TestArray; print "\nsplice Count is $Count\n"; if ( @TestArray ) { print"\nAfter Splice - TestArray IS here\n\n"; } else { print"\nSplice - TestArray is NOT here\n\n"; }
I'm also wondering if my RAM-clearing step is necessary. I'm loading the data into the array with @FileData = <NEXTFILE>;
Perl allocates more RAM to the array if the next file is larger than the one loaded before it. Does Perl also release RAM from the array if the next file is smaller than the one loaded before it ?
Dyslexics Untie !!!

Replies are listed 'Best First'.
Re: arrays : splice or undef ?
by davido (Cardinal) on Jun 04, 2013 at 17:11 UTC

    The best approach would be to leverage the benefits of Lexical Scoping. ...a brief example...

    for( 1 .. 100 ) { my @array; my $c = 0; while( $c++ < 500_000 ) { push @array, rand; } print "\@array holds ", scalar( @array ), " elements.\n"; }

    On each iteration of the outer "foreach" loop, @array is declared, filled, checked for an element count, and then falls out of scope, at which time the memory is released back to Perl for the next iteration.

    If you watch Perl's memory usage during the run of this script you will see that after the first iteration of the 'foreach' loop, Perl never requires any significant additional memory.

    The other thing to consider is your algorithm itself. Do you need the entire file to be slurped into an array? Or can you iterate over it line by line and process each line individually? The latter will almost always be more memory efficient. And finally, even if you do slurp the entire file into an array, each time you slurp it again into the same array, the previous contents are discarded and that memory becomes available again to Perl. Nevertheless, careful use of lexical scoping solves a whole slew of potential problems, memory usage being only one of them.


    Dave

      Though I haven't tested it, I assumed from the start that line-by-line processing would slow script execution due to the number of lines. The largest file I've run so far is 231 MB, with over 3.7 million lines.
      After the file is loaded, I do iterate each line individually. Meaning, subroutines with more arrays. The additional processing arrays are of course subsets of the file array, and another area in which I'm trying to avoid excessive disk I/O ( paging on Windows ).
      Dyslexics Untie !!!

        On my system it takes about 63/100ths of a second to read line by line through a file of 3.5 million lines that is 272 megabytes in size (that's the closest to 231MB and 2.7M lines that I happened to have laying around). That's with a no-op loop; whatever you do to process the lines of the file will consume time too, but they will consume virtually the same time whether you're iterating over lines from a file, or the elements of an array.

        If performance is an issue, profile.


        Dave

Re: arrays : splice or undef ?
by BrowserUk (Patriarch) on Jun 04, 2013 at 18:00 UTC

    Given that undef is designed to do that job, I wonder why you would use anything else?

    1. @a = (); clears the scalars and sets the notional size as reported by scalar @a or $#a to zero, but it does not release or reduce the basic AV. Thus it hangs on to some memory even though the array is 'empty';
      use Devel::Size qw[ total_size ];; @a = 1 .. 1e6;; print total_size( \@a );; 32000176 @a = ();; print total_size( \@a );; 8000176
    2. splice @a; has the same, hang-on-to-some-of-it effect as @a = ();
      @a = 1 .. 1e6;; print total_size( \@a );; 32000176 splice @a;; print total_size( \@a );; 8000176
    3. undef however, does a proper job and releases (almost) all the associated memory back to the process pool:
      @a = 1 .. 1e6;; print total_size( \@a );; 32000176 undef @a;; print total_size( \@a );; 176

      All it retains the essential head structure.


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: arrays : splice or undef ?
by tobyink (Canon) on Jun 04, 2013 at 17:39 UTC

    "Using either method with a test script, Perl doesn't openly acknowledge the existence of the array after it's been undef'd or spliced.

    if ( @TestArray ) { print"\nPush - TestArray IS here\n\n"; } else { print"\nAfter Push - TestArray is NOT here\n\n"; }

    In the condition if (@TestArray), Perl is not checking whether @TestArray exists. It does exist. (You'd get a compilation error thanks to strict if it did not exist.)

    if (...) is (dare I say "always"?) equivalent to if (scalar(...)), and an array evaluated as a scalar yields the count of elements within it. So if (@TestArray) just means the same as <c>if (is_true(count(@TestArray))), if Perl had is_true and count built-ins. Zero is of course false.

    Personally, I think the easiest/clearest way of emptying an array is simply:

    @TestArray = ();

    I'm not sure how it benchmarks against undef or splice, but there's probably little difference.

    There may be some behaviour difference between undef, splice and =() in the case of tied arrays, but I've not investigated this.

    And I concur with Dave's conclusion that if possible you should avoid slurping the entire file into an array to begin with.

    package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name