Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

In a nutshell. I am iterating via a loop through a group of files placing that data into an array each time, processing the data (wrting out processed data) by looping through said array @data, then supposedly releasing the data at the end of the processing loop by @data=(); undef @data; .... apparently I am missing something subtle because this is not working... with each file that I read in even though I "clear" out the array and close then reopen the <IN> filehandle with each succesive file iteration, however with each iteration the program memory footprint get larger. What am I missing? thanks in advance for your help.
  • Comment on releasing memory from a loop in Windows

Replies are listed 'Best First'.
Re: releasing memory from a loop in Windows
by gellyfish (Monsignor) on Aug 07, 2006 at 18:49 UTC

    You probably want to start by reading the FAQ entries that discuss memory usage in the first place.

    /J\

Re: releasing memory from a loop in Windows
by swampyankee (Parson) on Aug 07, 2006 at 19:08 UTC

    First, a suggestion: supply more information! Please read the section of the PerlMonks FAQ on "How do I post a question effectively?".

    After poking through the Camel Book, esp. the pages (299-301 in my copy, which is the Sep 1996 version of the 2d edition), it says "When a block is exited, my variables are normally freed up." (italics mine). I suspect that means that for a loop like this:

    foreach(@file){ open(my $fh, "<", "$file") or die "Could not open $file because $! +\n"; my @data = <$fh>; close($fh); } #memory used for @data gets garbage collected here

    memory doesn't get garbage collected until after the comment (or at the fragments's die statement)

    I understand this to mean that code like the fragment above and this would g/c in the same way

    { my @data; foreach(@file){ open(my $fh, "<", "$file") or die "Could not open $file becaus +e $!\n"; @data = <$fh>; close($fh); } }

    added in update

    My understanding would also imply that the memory used will be based on the largest file processed in the loop. Also, Perl doesn't return memory to the O/S until it exits, so the (kilo|mega|giga|tera|peta|exa)byte or so you've used to read the largest of the files is kept in Perl's hot little hands.

    end of addition

    Of course, there is a non-zero chance I'm wrong, and totally in left field (US idiom meaning "my answer is not only wrong but inane.").

    emc

    Experience is a hard teacher because she gives the test first, the lesson afterwards.

    Vernon Sanders Law

      First of all, Perl doesn't use garbage collection — it uses reference counting — so to say @data is *garbage collected* is wrong. Use the more general term *freed* instead.

      The general rule is that variables are freed at the end of the scope in which they are declared, and new ones are created the next time the scope is entered. In this case, the end of the scope is the end of the loop pass. (It is not the end of the loop.) That means @data gets freed at the end of every loop pass, so there is no accumulation of memory use. There's an optimization at play, but it is of no consequence.

      The exception is when a reference to a variable survives the scope in which the variable is declared. In the following, @data and its contents are NOT freed at the end of every loop pass, because a reference to the array survives the loop.

      my %all_files; foreach my $file (@files) { open(my $fh, "<", $file); my @data = <$fh>; $all_files{$file} = \@data; close($fh); }

      By the way, the close($fh) is redundant, since freeing $fh closes the file handle.

      Caveat

      To speed things up, Perl loops free the contents of variables instead of freeing the variables themselves at the end of every loop pass. This is completely transparent and does not use up any extra memory. In this case, it means

      foreach my $file (@files) { open(my $fh, "<", $file; my @data = <$fh>; close($fh); }

      is equivalent to

      { my @file; my $fh; foreach my $file (@files) { open($fh, "<", $file); @data = <$fh>; close($fh); undef $fh; # Frees $fh's content. undef @data; # Frees @data's content. } }

        IMHO it uses "reference counting garbage collection". You don't have to allocate and deallocate memory, do you? I don't see why would the term "garbage collection" be used only for the "mark&sweep" and "copying" types.

        Except maybe that Microsoft, for marketing purposes, wanted people to believe .Net has something old VB did not have. So instead of saying that .Net has a mark&sweep garbage collector, while VB used a reference counting one, they want everyone to think, .Net has garbage collection, while VB did not.

Re: releasing memory from a loop in Windows
by ikegami (Patriarch) on Aug 07, 2006 at 23:35 UTC
    It's hard to see what you are missing, especially if it's somthing subtle as you think, without seeing said nutshell. We'll need to see your code, since the problem is in your code.
Re: releasing memory from a loop in Windows
by rvosa (Curate) on Aug 08, 2006 at 01:10 UTC
    How are you measuring the memory footprint? If you do it from the perspective of the OS: I don't think perl returns freed memory back to the OS for the running time of the program. I thought that this is a general thing (not just perl), but this is all hearsay so perhaps someone can clarify (rather than get carried away by the exact definition of 'garbage collection').

      Under some circumstances, Perl (AS) under Win32 will release memory back to the OS.

      To demonstrate this, in the following snippet 'tasklist.exe', an MS executable, is used to query the OS for the current memory allocation for the current perl process before and after the allocation of a 100 MB chunk of memory, and again after the Perl scalar containing that memory is undef'd:

      c:\test>p1 [0] Perl> system qq[tasklist /fi "PID eq $$"];; Image Name PID Session Name Session# Mem Usag +e ========================= ====== ================ ======== =========== += perl.exe 888 0 2,972 +K [0] Perl> open RAM, '>', \$ram;; [Bad file descriptor] Perl> print RAM 'X' x 1e6 for 1 .. 100; print le +ngth $ram;; 100000100 [0] Perl> system qq[tasklist /fi "PID eq $$"];; Image Name PID Session Name Session# Mem Usag +e ========================= ====== ================ ======== =========== += perl.exe 888 0 101,008 +K [0] Perl> close RAM;; [0] Perl> undef $ram;; [0] Perl> system qq[tasklist /fi "PID eq $$"];; Image Name PID Session Name Session# Mem Usag +e ========================= ====== ================ ======== =========== += perl.exe 888 0 3,256 +K

      At startup, the perl executable is using just under 3 MB of ram.

      After the allocation of 95.36 MB (100 * 1e6) of memory to a Perl scalar (via a ramfile), the memory allocated to the executable stands at just over 101 MB.

      After the ramfile is closed and the variable $ram is undef'd, the memory allocated to the process falls back to just over 3MB.

      This occurs because, under Win32, large allocations of ram are allocated and freed using calls directly to the OS. See win32\vmem.c. Specifically, the freeing of memory allocated to the process back to the OS occurs in the following destructor (~ Vmem.c:510 in the source tree):

      VMem::~VMem(void) { #ifndef _USE_BUDDY_BLOCKS ASSERT(HeapValidate(m_hHeap, HEAP_NO_SERIALIZE, NULL)); #endif WALKHEAPTRACE(); DeleteCriticalSection(&m_cs); #ifdef _USE_BUDDY_BLOCKS for(int index = 0; index < m_nHeaps; ++index) { VirtualFree(m_heaps[index].base, 0, MEM_RELEASE); <<<<<<<<< HERE! } #else /* !_USE_BUDDY_BLOCKS */ #ifdef USE_BIGBLOCK_ALLOC for(int index = 0; index < m_nHeaps; ++index) { if (m_heaps[index].bBigBlock) { VirtualFree(m_heaps[index].base, 0, MEM_RELEASE); <<<<< HERE! } } #endif BOOL bRet = HeapDestroy(m_hHeap); ASSERT(bRet); #endif /* _USE_BUDDY_BLOCKS */ } void VMem::ReInit(void) { for(int index = 0; index < m_nHeaps; ++index) { #ifdef _USE_BUDDY_BLOCKS VirtualFree(m_heaps[index].base, 0, MEM_RELEASE); <<<<<< HERE! #else #ifdef USE_BIGBLOCK_ALLOC if (m_heaps[index].bBigBlock) { VirtualFree(m_heaps[index].base, 0, MEM_RELEASE); <<<<< HERE! } else #endif HeapFree(m_hHeap, HEAP_NO_SERIALIZE, m_heaps[index].base); #endif /* _USE_BUDDY_BLOCKS */ } Init(); }

      See MSDN for further information on the operation of this and related OS apis.


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.