Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a fairly substantial perl program that processes a list of items. For each item, it creates many, many hash entries in a variety of ten or so hashes. After each item is completely processed, I set each of the hashes to the empty list (I also tried undef'ing them).

After running through 50 or so items, my hard drive begins working very hard and performance slows to a crawl. I presume that Perl has used up available system memory and Linux (RedHat 9 - Perl 5.8.0) has begun swapping.

I am resonably sure that I don't have any circular references in my data structures that would circumvent garbage collection. It *appears* that Perl is using additional system memory rather than being more aggressive in its garbage collection.

Is there any way to "tune" the garbage collector ? Alternatively, is there anyway to monitor the memory space during execution to identify any uncollected/leaking memory ?

Thanks :)
  • Comment on Uncollected garbage leads to swapping ...

Replies are listed 'Best First'.
Re: Uncollected garbage leads to swapping ...
by davido (Cardinal) on Feb 25, 2004 at 05:13 UTC
    I'm curious what your code looks like. While it doesn't surprise me if the OS fails to reclaim freed memory, I'm surprised that you're finding that after a variable goes out of scope its memory isn't in some way reused by Perl. Could you boil down a snippet that does what you're talking about?

    Here is a test snippet where you can prove to yourself that Perl does re-use its own memory space. The following snippet maintains a continuous loop where 10,001 hash elements are assigned integer values, and then the hash falls from scope and is recreated again. If the same memory isn't reused by Perl, it should take no time at all for this snippet to fill all available memory. But as I watch Perl's memory usage in the Windows Task manager, it never jumps above about 4,300k, so my test doesn't reproduce your issue. At least in this case, Perl does the right thing.

    use strict; use warnings; while (1) { my %hash; @hash{0..10000} = (0..10000); }

    By the way, (this is completely unrelated to your question) this snippet does get me a little wierd error intermittantly upon termination with a CTRL-C break:

    Terminating on signal SIGINT(2) Terminating on signal SIGINT(2) Attempt to free unreferenced scalar: SV 0x188d214 at C:\Perl\Scripts\m +ytest.pl line 9.

    ...line nine is the line with the hash slice on it. I'm using Active State Perl 5.8.2, binary build 808. The error occurred twice out of a few dozen repeated attempts. Of course this error is unrelated to your question, but curious nevertheless.


    Dave

Re: Uncollected garbage leads to swapping ...
by thospel (Hermit) on Feb 25, 2004 at 06:04 UTC
    One thing I sometimes do if I'm not sure some datastructure is getting cleaned up (or when EXACTLY it gets cleaned up) is to make a "Canary" class, and add a reference to a new canary in each (interesting) instance of the datastructure, counting them as I go. Class Canary also gets a DESTROY method where I lower the count again. This counter will then at any point tell me how many of the datastructures are still around. And quite often the datastructure can itself can be the object that gets counted, so you don't even need an extra class.
Re: Uncollected garbage leads to swapping ...
by fizbin (Chaplain) on Feb 25, 2004 at 07:15 UTC
    Your comment about tuning the garbage collector is appropriate to an environment (like, say, most jvms) which doesn't collect garbage immediately when the reference count reaches 0. Perl, however, doesn't do that.

    Perl 5.8.0 does however have a few known memory leaks - I suggest you check to see if your bug is mentioned on http://rt.perl.org. Googling for "perl 5.8.0 memory leak" may also prove useful. One thing you may find is that it's not your hashes that are leaking - as you haven't told us what else you do in your code, it's hard to say. (It's been reported that in some perl versions, simply having a versioned "use" statement in a loop will end up leaking memory)

    I'll add what another poster here said - see if you can reduce the code to a simple example that leaks and that you can post.

Re: Uncollected garbage leads to swapping ...
by kvale (Monsignor) on Feb 25, 2004 at 05:12 UTC
    One way to gaurantee garbage collection is to exit a process. If nothing else works, launch a child process for each element of your list. The child will get the element, populate the hashes and whatever else, and exit.

    -Mark

Re: Uncollected garbage leads to swapping ...
by dragonchild (Archbishop) on Feb 25, 2004 at 14:28 UTC
    With proper scoping of variables and avoidance of circular data structures, this should never happen. I'm willing to bet you are doing at least one of those.

    Your statement that you have to undef your hashes, to me, says you're not properly scoping your variables. If you properly scope your variables to the tightest-scope, Perl will do all the garbage collection for you.

    Real-life circular data structures are most easily demonstrated by the following:

    my %child = ( a => 1, ); my %parent = ( child => \%child, ); $child{parent} = \%parent;

    Now, unless you explicitly break the circular reference (or use WeakRef), the memory used by %child and %parent will stick around until the process is done, even if the variables are out of scope.

    ------
    We are the carpenters and bricklayers of the Information Age.

    Please remember that I'm crufty and crochety. All opinions are purely mine and all code is untested, unless otherwise specified.

Re: Uncollected garbage leads to swapping ...
by adrianh (Chancellor) on Feb 25, 2004 at 14:41 UTC
    Alternatively, is there anyway to monitor the memory space during execution to identify any uncollected/leaking memory ?

    You might want to look at Test::Memory::Cycle and Devel::Cycle to check for circular references.

Re: Uncollected garbage leads to swapping ...
by zentara (Cardinal) on Feb 25, 2004 at 17:13 UTC
    I'm no internals expert, but maybe you could sprinkle some print statements to print out your hash sizes, every now and then, and look for a pattern.
    #!/usr/bin/perl use Devel::Size 'total_size'; my %struct = ("key1" => ["bill","ben"], "key2" => ["dave","jen"]); print "total size of %struct - ", total_size(\%struct),"\n";
    I've been getting "bit" alot lately with "auto-vivication" of hashes. I would delete and undef a hash, and then it would get "recreated" accidently by some sub, which just tests for existing values. For instance:
    #update last viewed timestamp $info{$key_prev}{'timestamp'} = time;
    Well I would delete and undef $info{$key_prev'}; and thought it was gone, but nope, it got auto-vivified. So I needed:
    if(defined $info{$key_prev}){ #update last viewed timestamp $info{$key_prev}{'timestamp'} = time;
    So maybe you should check for sneaky auto-vivications?

    I'm not really a human, but I play one on earth. flash japh