Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Long running tasks, perl and garbage collection

by GoCool (Scribe)
on May 20, 2009 at 19:11 UTC ( [id://765312]=perlquestion: print w/replies, xml ) Need Help??

GoCool has asked for the wisdom of the Perl Monks concerning the following question:

This very simple and seemingly trivial task of illustrating perl's garbage collection seems to exhibit the most unexpected results and I'm clueless as to why. One would expect to find a reduction in the RSS of the process when a huge hash is deallocated, undef'd *and* goes out of scope but far from decreasing, the RSS seems to have increased in the end.

Why isn't it being garbage collected when it goes out of scope? When exactly is the hash garbage collected by perl? If the answer is when the program exits, then it seems like we'd have bigger problems when dealing with long running processes such as mod_perl.

One plausible explanation could be that even though perl actually garbage collects it, it's not released to the OS to be reclaimed just yet. If that is indeed the case then it seems like long running tasks/processes like mod_perl continually grow in size until that particular apache/mod_perl http process is killed/terminated.

What am I missing here? Any insights appreciated.
perl -e ' use strict; use warnings; print "\nInitial size:\n" . qx {ps -o rss $$}; { my %x = (); for (my $i = 0; $i < 100000; $i++) { $x{$i} = 1; } print "\nafter allocating a huge hash:\n" . qx {ps -o rss $$}; for my $k (keys %x) { delete $x{$k} } undef %x; print "\nafter deallocating the huge hash\n" . qx {ps -o rss $$}; } print "\nafter the huge hash goes out of scope\n" . qx {ps -o rss $$}; +'
-GoCool

Replies are listed 'Best First'.
Re: Long running tasks, perl and garbage collection
by ikegami (Patriarch) on May 20, 2009 at 19:21 UTC

    Two things to know about Perl memory management:

    • When memory is freed, it can be released to perl's free memory pool rather than being released to the OS. Future allocations will dip into this pool first.

    • Heap fragmentation can occur, which means memory may need to be allocated for a structure even though the total memory available within then heap is big enough to hold the structure.

    I don't know if that helps.

    By the way, you are wrong about the RSS increasing from the deallocation.

    ... print "\nbefore deallocating the huge hash\n" . qx {ps -o rss $$}; undef %x; print "\nafter deallocating the huge hash\n" . qx {ps -o rss $$}; ...
    Initial size: RSS 1684 after allocating a huge hash: RSS 9088 before deallocating the huge hash RSS 12532 after deallocating the huge hash RSS 12020 after the huge hash goes out of scope RSS 12020

    The increase you saw was probably used to hold the results of keys %x.

Re: Long running tasks, perl and garbage collection
by JavaFan (Canon) on May 20, 2009 at 19:21 UTC
    What you are seeing is right. Your conclusions are wrong. Once perl allocates memory form the OS, it's unlikely to give it back to the OS. Sure, if a hash goes out of scope, its memory is reclaimed. By perl. Kept in reserve, so the next time perl needs memory, it's already there.

    Note that this behaviour is typical for (Unix) processes. Most "freed" memory isn't given back to the OS until the process terminates.

Re: Long running tasks, perl and garbage collection
by zwon (Abbot) on May 20, 2009 at 20:07 UTC

    Actually it depends on how you allocate memory and on malloc implementation. E.g. on linux malloc may use to allocate memory brk if you asking some small amount or mmap if you wanna much of it. Memory allocated by brk usually can't be freed, but mmaped memory may be easily munmaped. Here's example that demonstrates this:

    use strict; use warnings; my $hr = {}; my ($num, $size) = @ARGV; print "$num, $size\n"; system("ps vp $$"); for (1..$num) { $hr->{$_} = 'memory' x $size; } system("ps vp $$"); undef $hr; system("ps vp $$");

    On my Ubuntu amd64 I got the following results:

    $ perl memory.pl 10000 1000 10000, 1000 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 14098 pts/0 S+ 0:00 0 3 18164 2292 0.0 perl memory.p +l 10000 1000 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 14098 pts/0 S+ 0:00 0 3 78204 62436 1.5 perl memory.p +l 10000 1000 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 14098 pts/0 R+ 0:00 0 3 77384 61648 1.5 perl memory.p +l 10000 1000 $ perl memory.pl 100 100000 100, 100000 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 14105 pts/0 S+ 0:00 0 3 18164 2292 0.0 perl memory.p +l 100 100000 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 14105 pts/0 S+ 0:00 0 3 77552 61684 1.5 perl memory.p +l 100 100000 PID TTY STAT TIME MAJFL TRS DRS RSS %MEM COMMAND 14105 pts/0 R+ 0:00 0 3 18752 2888 0.0 perl memory.p +l 100 100000

    If you check with strace you'll see that it uses brk in first case, but mmap/munmap in the second.

Re: Long running tasks, perl and garbage collection
by Fletch (Bishop) on May 20, 2009 at 19:22 UTC

    It depends on if the OS and/or the malloc used can/do return memory to the OS (many can't/don't). See memory deallocation for more discussion.

    The cake is a lie.
    The cake is a lie.
    The cake is a lie.

Re: Long running tasks, perl and garbage collection
by chromatic (Archbishop) on May 20, 2009 at 23:04 UTC
    One plausible explanation could be that even though perl actually garbage collects it, it's not released to the OS to be reclaimed just yet.

    Allocating memory from the OS -- especially if it's a fixed size that you're likely to want to reallocate soon -- is a great way to make your program run slowly. I sped up all Parrot function calls a few minutes ago by removing a pair of malloc/free calls.

Re: Long running tasks, perl and garbage collection
by trwww (Priest) on May 20, 2009 at 21:28 UTC

    One plausible explanation could be that even though perl actually garbage collects it, it's not released to the OS to be reclaimed just yet. If that is indeed the case then it seems like long running tasks/processes like mod_perl continually grow in size until that particular apache/mod_perl http process is killed/terminated.

    Yeah thats how it works. If you read around you'll find it pretty thoroughly documented and discussed.

    If I have an operation that I need to perform from a long running process thats going to use up a big chunk of memory that I want to be returned to the OS, I put the task in a job queue for a different, short running process to handle.

    regards,

Re: Long running tasks, perl and garbage collection
by grinder (Bishop) on May 21, 2009 at 14:59 UTC

    In any programming language, memory leaks are the bane of long running processes. You might be using a library that leaks. The library on your platform might be sane, but it leaks on another.

    A master class programmer will take this into account and arrange to push as much processing as possible into transient processes. Apache and Postfix are two exemplars of this design.

    mod_perl also behaves this way. Looking at one of my sites, I see that the mod_perl controller was started on the 1st of May, and it's occupying about 50Mb. The oldest worker process is about 5 hours old and clocks in at about 105Mb. The youngest are about 30 minutes old and haven't risen much beyond the initial 50Mb. By tonight they will all have been reaped, and the RAM recycled. There may be some crappy CPAN code that leaks like a sieve in there, but as far as I am concerned it is below the radar, thanks to mod_perl's controller/worker architecture.

    This approach is simple to put into practice and surprisingly robust.

    • another intruder with the mooring in the heart of the Perl

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://765312]
Approved by Perlbotics
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (5)
As of 2024-03-28 17:09 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found