codingchemist has asked for the wisdom of the Perl Monks concerning the following question:

short scenario: I have noticed with perl on OSX that when I close a filehandle, the memory does not get completely deallocated, so when my perl script opens and closes lots of files (extremely large files) the memory doesn't get deallocated, the VSIZE keeps growing, and eventually I get a malloc error. Any ideas?

long detailed scenario: I am sorting extremely large files (>2Gig) on a MACOSX. I take a chunk of the file (usually about 100000 lines) sort them, store them in a tempfile. take another chunk of the file, sort those lines, and then merge them with the tempfile to another tempfile. repeat ad nauseum. I use two temp files and toggle between reading and writing to them. what I am doing now, is everytime I want to merge my sorted lines with a tempfile I open both temp files (one for reading, one for writing) and then close both of them before grabbing the next chunk of lines. for some reason closing the files isn't cleaning out memory the way it should, so if I keep this up, I get a malloc error, and then the system panics (note: this doesn't happen on linux). I am trying to work it so I can leave the files open, but on OSX the O_RDWR|O_CREAT flags for sysopen aren't actually letting me read the file, just write to it, same with the +>> for open. any ideas?
  • Comment on cleaning up memory after closing files on MACOSX

Replies are listed 'Best First'.
Re: cleaning up memory after closing files on MACOSX
by hardburn (Abbot) on Aug 27, 2003 at 20:59 UTC

    Perl doesn't free memory to the system until it exits, and there is really nothing that can be done about it. There are only a few OSes that could support such a feature (IIRC, most (all?) of them are embedded, real-time OSes). Fortunatly, Perl will take memory it already allocated and use it for something else later on. Until Perl needs to use it again, those memory pages will sit in the swap file, so this shouldn't be a big problem.

    However, I'm surprised MacOS X is freaking over this where Linux doesn't. Possibly a weakness in MacOS X's virtual memory implementation, or possibly something wrong with how Perl interacts with MacOS. Or you're not using tight lexical scoping, so that memory is never reclaimed by Perl (but I'd expect Linux to also complain in that case).

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      you have it almost right, perl running on most moderen OS will free memory, the OS will try to absorb the freed memory. The issue comes into when and how the memory was allocated from the OS, if it was taken in in small chunks (like durring a loop where you grow an array wiothout predefing how large it would be) and is freed the OS may have problems taking it back and making it available. If the memory was allocated in a large chunk it has a batter chance to be freed and taken back into the pool after usage. On the plus side perl will reuse allocated memory when it can so this can end up saving time. If the OP is seeing memory continue to grow every time he does a merge file he may want to look for a data structure that is not going out of scope and getting freed between each loop, this may be a closure issue or just a bug where you are using a global var instead of a locally scoped one that clears itself when you leave the scope.

      -Waswas
        That's not quite right either. Most modern OS's have facilities for returning memory to the system (on UNIX, it's usually mmap(2)), but most software, including Perl, doesn't make good use of them. See, for example:
        #!/usr/bin/perl use constant SLEEP => 20; warn "Starting\n"; my @arr = (1..1_000_000); warn "Allocated\n"; sleep(SLEEP); undef @arr; warn "Freed.\n"; sleep(SLEEP);
        Clearly most of the memory allocated will be in contiguous blocks, and could be reclaimed by the OS, but if you watch top you'll see that almost none ever is.
Re: cleaning up memory after closing files on MACOSX
by sgifford (Prior) on Aug 28, 2003 at 04:46 UTC

    It could be a memory leak in your code, in Perl, or in MacOS X. Try to reduce the problematic code to the smallest portion that still shows the memory leak; that should point you a good bit of the way. If you still have problems, post the (now very short) program, and one of the Monks here should be able to help you.

    For example...

      Here is basically where I have pinpointed the memory leak to (caution: I don't know if this code will actually work, its just a snippet from my program):
      $linectr=0; $ctr2=0; for $line (@array){ $hash{$linectr}=$line; if($ctr2=100000){ addtofile(\%hash); %hash=(); $ctr2 = 0; } $ctr2++; $linectr++; } sub addtofile { my $hashref = shift; open(FH,$tempfile); foreach $value (keys %$hashref) { print FH "$$hashref{$value}:$value\n"; } close FH; }
      when I close the FH, some memory gets deallocated, and the RSS size on top goes down, but not completely, so it keeps growing and growing. on Linux this doesn't happen very fast, so my program usually finishes before I run out of memroy, but on OSX the memory grows by leaps and bounds, and I run out of memory about 10-15 passes through the program.

      however, I found a solution to my problem using Berkeley DB. Since I am trying to sort files that are too large to put in memory, I thought I would have to break them up and use a merge-sort algorithm, but if I tie an empty file to the Berkeley DB object I can treat the file as an array and insert lines into the middle of the file, so no need to keep opening and closing files.

      janitored by ybiC: balanced <code> tags as per Monastery convetion, and a bit o'formatting

        Glad you found a solution to your problem!

        As far as the memory leak, how big is @array? Does a loop like this leak memory?

        foreach my $i (0..$#array) { open(FH,"> temp$i") or die "open: $!"; print FH $array[$i]; close FH; }

        A few other random comments:

        • Do you realize that if($ctr2=100000) will always be true, since = is the assignment operator not a comparison operator?
        • A hash is a strange data structure to store an ordered list of things, like lines in a file. An array is more appropriate.
        • Even better, rather than keeping 100000 lines in memory at a time, just keep the output file open, then print lines to it as you read them.
        • Also consider using use strict and the -w flag, and using lexical variables for your loops:
          for my $line (@array){
          
          If you are doing something you don't realize, like causing a memory leak, this will often tell you. It's also clearer that a memory leak isn't happening when it will disappear immediately after the loop.
        • Or see if you can get sort(1) to do this for you.