wagnerc has asked for the wisdom of the Perl Monks concerning the following question:

Greetings. I've run into one massive memory leak problem. I have a multithreaded script that goes out and collects info from network devices and dumps it to a file (it will eventually go to mysql). The problem is that after about half an hour the memory usage starts spiraling up out of control. 100 threads start out using about 500 MB but it soon goes to 2 GB. I've paired the script down to the bare minimum trying to locate the leak but no to avail.

A colleage said he heard something about a problem with localized hashes within hashes not freeing memory properly. Anyone heard anything about this? The polling subroutine writes everything to a thread localized hash tree and the main loop writes that hash to a file. The code is like this:

require Net::SNMP; use Net::DNS; require Digest::MD5; use threads ('stack_size' => 131072); use threads::shared; use Thread::Queue; use Thread::Semaphore; use Data::Dump qw(dump pp); { ## thread entry point for (;;) { local %datahash; ..getdevicetopoll.. &ifpoll($dev); print FILE dump %datahash; } } sub ifpoll { my %devinfo; my %interfaces; ## etc... ..snmp a lot of data here.. %{$datahash{"devinfo"}} = %devinfo; %{$datahash{"interfaces"}} = %interfaces; return; }
All the code is pretty much like that. Collect data in my'd hashes. Then assign that data to the main local'd hash and write out. I checked the symbol table (use Symbol) and %datahash is indeed blank. I even wrote a sub that walked %datahash and manually deleted each key. No difference. I also used the Internals module to check ref counts on the intermediate variables but they're not increasing. My fear here is that there's some stealth "closure" situation going on. Something preventing the lexical data from being garbage collected. Every variable is either my'd or local'd.

Any ideas on what this could be? Any ideas on how to even troubleshoot this? I'm using Solaris 8 and ActiveState Perl 5.8.8 build 817. I've tried build 820 and compiling my own Perl with -DDEBUGGING. No difference.

If anybody has any ideas I'ld be greatly appreciative. Thanks.

Replies are listed 'Best First'.
Re: Massive Perl Memory Leak
by halley (Prior) on Jun 11, 2007 at 17:09 UTC
    Don't use local %datahash, use my %datahash.

    Unless you really really understand what the difference is between these two keywords, 99.99% of the time, you want my.

    The local keyword is a way to temporarily shadow the original value so changes aren't kept, and then automatically restore that original value later.

    While I see that you're using the structure in both routines, it seems a bit odd that you're trying to work with the global %datahash from both routines (and others). I think you might want to check out the use of references, so you can work with data structures more generally instead of having multiple routines try to protect their values and effects on these global variables.

    --
    [ e d @ h a l l e y . c c ]

      Thanks for getting back to me. I'm using the local %datahash construct so I can avoid gotchas with passing references (but it seems I've run into another gothcha somewhere else). %datahash is essentially just a thread global and the local is there to automatically garbage collect it as the forever loop restarts. I can't use my because the variable wouldn't be visible to the descendant subroutines. Now correct me if I'm wrong, but isn't %datahash garbage collected the moment the forever loop loops? It's not creating *new* %datahashes and keeping the *old* %datahash's stashed away right? I'm manually walk deleting everything out of %datahash at the end of the loop so either way it shouldn't be building up memory.
        $foo = 5; sub do_something { print $foo, $/; } sub main { do_something(); { local $foo = 6; do_something(); } do_something(); } main();
        The above code is a typical use of local. Internally, it is conceptually equivalent to the following code (shown for the innermost block only). You'd get 5, 6, 5 printed.
        { my $unnamed_backup_of_old_value_of_foo = $foo; $foo = 6; do_something(); $foo = $unnamed_backup_of_old_value_of_foo; }

        The actual storage of $foo is the first statement. It is what gets changed whenever you see assignments being done.

        This of course can get very big and hairy with a huge hash of data to "backup" and "restore" in the equivalent %unnamed_backup_of_old_value_of_datahash. Also, deleting items from the local hash would not have much bearing, since the whole hash will just get tossed out and restored from the backup.

        If you're doing local from multiple threads, well, you can see where your memory is going. Secondly, I can quite easily imagine a race condition where your main shared global hash is getting "restored" in the wrong order, leading to pandemonium.

        --
        [ e d @ h a l l e y . c c ]

        I can't answer the garbage collection issue except to say what you probably already know: Perl deletes objects' memory when it is no longer referenced.

        However using global variables to "pass" information between subs is bad, bad, bad! Instead pass a reference to the descendant subs so it is clear where information is being used and possibly modified. If you find you are passing a lot of parameters consider using light weight object oriented techniques by simply blessing a hash into the current package then calling the descendant subs on the object:

        my $obj = bless {datahash => \%datahash, ...}; $obj->sub1 (); $obj->sub2 (); ... sub sub1 { my $self = shift; $self->sub3 (wibble => 3); } ... sub sub3 { my ($self, %params) = @_; for my $key (keys %{$self->{datahash}}) { my $value = $self->{datahash}{$key} * $params->{wibble}; ... } }

        and if you have trouble with getting references going ask! Writing nasty code to avoid learning a new and valuable technique will not reward you over the long run, and by the sound of it not even in the short run. You may find References quick reference helps.


        DWIM is Perl's answer to Gödel
      Do any of u see any problem with my hash usage? Specifcally:

       %{$datahash{"devinfo"}} = %devinfo; I'm curious as to how the deep nature of %devinfo is copied over to the %datahash branch. As in do any refs to the original var survive? Or is that a 100% clean copy with no strings attached.

      Would either of these be equivalent or better syntax? :

      $datahash{"devinfo"} = \%devinfo; $datahash{"devinfo"} = %devinfo; # this must be wrong

        Consider:

        use strict; use warnings; use Data::Dump::Streamer; my %devinfo = (1 => {a => 'apple', b => 'orange'}); my %datahash = (devinfo => {}); %{$datahash{"devinfo"}} = %devinfo; Dump (\%devinfo, \%datahash, $datahash{devinfo});

        Prints:

        $HASH1 = { 1 => { a => 'apple', b => 'orange' } }; $HASH2 = { devinfo => 'A: $HASH3' }; $HASH3 = { 1 => $HASH1->{1} }; alias_hv(%$HASH2, 'devinfo', $HASH3);

        The copy is a shallow copy. If any of the values of %devinfo are references then the copy simply duplicates the reference. You may find Storable's dclone helps if you are looking for a clone of the data.


        DWIM is Perl's answer to Gödel
Re: Massive Perl Memory Leak
by Joost (Canon) on Jun 11, 2007 at 22:25 UTC
    Well, looking at the pseudocode it all depends on how much data you're putting in %datahash.

    100 threads taking up 2 Gb is "only" 20 Mb per thread. the thread-local data in %datahash alone could account for that. As noted above, you should not count on perl sharing any data between threads (not even internally) unless you explicitly arrange for that. In other words, perl threads behave much more like separate processes than most other thread implementations do.

    It's easy enough to check if that's the problem: reduce the number of threads. If the amount of memory taken is stable relative to the number of threads there probably isn't a leak, you're just suffering from perl's memory-hungry approach to problem solving.

    update: I just want to note that using threads in this program doesn't seem to be necessary (assuming you're using threads to run multiple SNMP connections in parallel). It's perfectly possible (and probably more efficient) to do that in a single thread. For instance, you could use non-blocking IO and select() instead. I believe that Net::SNMP even provides a callback/non-blocking interface. See also POE::Component::SNMP.

      The %datahash hash only accumulates 50-150K depending on the device and it's blown away at each iteration. The script saves very little data persistently. I monitor those variables for size and they don't grow out of control. Watching the script with top shows the memory usage grow geometrically. Slow at first, then going up 1-5 meg every few seconds. The memory usage accelerates.

      I have a previous version of the script that doesn't have the memory leak. The only real difference is the use of a central datahash to keep everything. The old script just directly prints everything. That's why I think this is the culprit.

      I wrote a test script that used Padwalker to check on the my'd variables with peek_my and then Data::Dump that hash. That showed something very interesting. After I assigned the independent hashes into the datahash (%{$datahash{"branch"}} = %branchdata;) the dump showed both hashes were refering to the same data. It wasn't a pure copy. Like:

      do { my $a = { "%cdp" => { "1.4" => { id => "switch", ip => "1.2.3.4", platform => "WS-C6509", } }, "%datahash" => { cdp => { "1.4" => 'fix', "1.5" => 'fix' }, }, }; $a->{"%datahash"}{cdp}{"1.4"} = $a->{"%cdp"}{"1.4"}; };

      Does this shed any light on anything? Thanks.

        If %branchdata contains references to hashes or arrays, then doing a shallow copy into a key of %datahash will result in the sharing (not in the threads::shared sense!) of the referenced data between the two structures. Eg.

        %a = ( 1, {a=>b=>c=>d=>}, 2 =>[ 1..5 ] );; %{ $b{ copy } } = %a;; print Dumper \%a, \%b;; $VAR1 = { '1' => { 'c' => 'd', 'a' => 'b' }, '2' => [ 1, 2, 3, 4, 5 ] }; $VAR2 = { 'copy' => { '1' => $VAR1->{'1'}, '2' => $VAR1->{'2'} } };

        To deep copy compound structures, you need to use Clone or (as someone else advised earlier) Storable::dclone().

        But whether that has anything to do with your memory growth is impossible to tell from the code shown. If both of these structures are lexical and being garbage collected, then that (shallow copying) should not be the cause the symptoms you are describing.

        It sounds like you may be creating some kind of a circular reference somewhere. This could cause the kind of accelerating memory growth you mention. Maybe. But again, it's just guesswork without seeing the code.

        And the problem does not seem to be related to your use of threads. Though it is obviously exaggerated by there being 100 times as many 'programs' all doing the same bad thing--whatever that is.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
        I have a previous version of the script that doesn't have the memory leak. The only real difference is the use of a central datahash to keep everything. The old script just directly prints everything.
        Can you show the parts of your non-leaking script that are equivalent to what you show in your initial question?
Re: Massive Perl Memory Leak
by BrowserUk (Patriarch) on Jun 11, 2007 at 17:48 UTC

    Working code is always preferable to pseudo-code.

    But...does it make any difference if you don't use Data::Dump::dump()? Eg. Just print the hash rather than dumping it. My though is that is may not be thread-safe and is accumulating internal data (say, for circular reference detection) that isn't being cleaned up or is getting replicated.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      The hash can't be printed otherwise. It's a deep tree and just print'ing it would just give the top level keys and HASH(0x12345). Sorry I only have psuedo code. The actual program is huge[TM]. Although I could try to make a junk program that does nothing except assign to %datahash.

      The %datahash tree essentially looks like:

      ( "index", { baseip => "1.2.3.4", deviceid => "81F1FD9964973BB63B3446881F901AF5", sysname => "router", }, "routes", { "0.0.0.0.0.0.0.0.2.3.4.5" => { ifindex => 0, mask => "0.0.0.0", net => "0.0.0.0", nexthop => "2.3.4.5", proto => 3, type => 4, }, }, "interfaces", { 1 => { ARP => 3, ifAdminStatus => 1, ifAlias => "", ifDescr => "FastEthernet0/0", ifIndex => 1, ifOperStatus => 1, ifSpeed => 100_000_000, ifType => "ethernetCsmacd", },
      I found the page that talked about the hash of hashes problem. It's http://www.chat11.com/Perl_Memory_Usage . Ring any bells?
        he hash can't be printed otherwise. It's a deep tree and just print'ing it would just give the top level keys...

        I meant avoid the use of dump just temporarily to see if it is the cause of the memory growth.

        If avoiding it fixed the immediate problem, then you could look for another way of recursively printing deep data structures that didn't suffer the same problem. Maybe Data::Dumper, or Data::Dump::Streamer or even rmap with a little logic of your own.

        But first you need to track down the source of the problem. You could equally just comment out the print altogether.

        Is %datahash shared?

        What's in the hash doesn't matter.

        There is no logic I know of for using local rather than my in threads.

        Without sight of the full program (stick it on your scratchpad and /msg me if you don't want to post it), or a cut down but working version that demonstrates the same symptoms, it's really impossible to offer much in the way of help.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Massive Perl Memory Leak
by perrin (Chancellor) on Jun 11, 2007 at 18:54 UTC

    I don't know much about how thread data gets garbage collected. I do know they use a lot of memory though, since every time you make a new thread it copies all of the data structures in your perl interpreter. They are not shared, except for the ones you specifically mark.

    In addition, the way that memory allocated for perl structures gets returned when they go out of scope is not always intuitive. The memory typically stays allocated after the variable goes out of scope. But I don't know how this interacts with threads.

    Were you using local %datahash because you thought you needed that to prevent it from being shared by threads? You can actually just use our %datahash, since nothing is shared. Each thread will have a separate copy automatically.

      I was using local %datahash; as a cheap way of reinitializing the variable at each loop. Simply doing %datahash = (); would be functionally identical for my requirements.
Re: Massive Perl Memory Leak
by Zen (Deacon) on Jun 11, 2007 at 19:49 UTC
    I checked out the memory leak article you referenced. It gives no code nor evidence of any kind that it is a perl problem. That being said, there may or may not be a memory leak in this situation.

    In the short time I've been a member and seen the trepidation surrounding perl apps, I've never actually seen any "perl has a memory leak and it's ruining my program" posts substantiated. Mostly just unfounded suspicions that could be traced towards what the other monks suggested- inconvenient scoping and non-threadsafe modules that could be keeping references to your hash, thereby foiling the garbage collector.
Re: Massive Perl Memory Leak
by BrowserUk (Patriarch) on Jun 12, 2007 at 23:45 UTC

    I have another suggestion. I should have thought to mention it earlier.

    Modify the program to call the thread code directly from main and comment out the thread creation step.

    If the 'leak' is associated with the threads it should 'go away' completely, but my bet is that it will still happen--only 100 times more slowly. Once you've fixed the leak in the non-threaded version, the threaded version won't leak either.

    The simplest and best way to debug threaded code is to make the thread functions work indivually as non-threaded functions first. And only once you are happy they are working correctly, spawn copies.

    This is so ingrained in me, that it didn't cross my mind to mention it before now.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      But if you'd posted the code 3 days ago as asked, your problem would probably be fixed by now.
      Oh? :P The problem with that is that the code is over 100K and there's really nothing wrong with my code, I mean I've been staring at it for weeks and I have been doing this for many years.

      I think I've found something with my rebuild effort. A code block that when commented out, the script doesn't leak. I really defy anyone to find anything syntacticly wrong with this code. :)

      ## all in ifpoll local %routes = (); my $result6; ## device can have CIDR or non-CIDR routing $result6 = $$session->get_entries(-columns => [$ipCidrRouteIfIndex, $i +pCidrRouteProto, $ipCidrRouteType ]); $result6 = $$session->get_entries(-columns => [$ipRouteNextHop, $ipRou +teIfIndex, $ipRouteMask, $ipRouteProto, $ipRouteType]) unless %$resul +t6; if (!defined($result6)) { printf(STDERR "ERROR(routes): %s %s.\n", $devarg, $$session->err +or); } local $testkey = each %$result6; if ($testkey =~ m/^\Q1.3.6.1.2.1.4.24.4.1.\E/o) { print "Entering CIDR parsing\n"; foreach $key (keys %$result6) { if ($key =~ m/^\Q$ipCidrRouteProto\E/o) { my @temp = $key =~ m/^\Q$ipCidrRouteProto\E\.(\d{1,3}\.\d{ +1,3}\.\d{1,3}\.\d{1,3})\.(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.0\.(\d +{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/o; my $id = join ".", @temp; $routes{$id}{"proto"} = ${$result6}{$key}; } elsif ($key =~ m/^\Q$ipCidrRouteType\E/o) { my @temp = $key =~ m/^\Q$ipCidrRouteType\E\.(\d{1,3}\.\d{1 +,3}\.\d{1,3}\.\d{1,3})\.(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.0\.(\d{ +1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/o; my $id = join ".", @temp; $routes{$id}{"type"} = ${$result6}{$key}; } elsif ($key =~ m/^\Q$ipCidrRouteIfIndex\E/o) { # dest +network dest mask + next hop my @temp = $key =~ m/^\Q$ipCidrRouteIfIndex\E\.(\d{1,3}\.\ +d{1,3}\.\d{1,3}\.\d{1,3})\.(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.0\.( +\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/o; my $id = join ".", @temp; $routes{$id}{"net"} = $temp[0]; $routes{$id}{"mask"} = $temp[1]; $routes{$id}{"nexthop"} = $temp[2]; $routes{$id}{"ifindex"} = ${$result6}{$key}; } } } elsif ($testkey =~ m/^\Q1.3.6.1.2.1.4.21.1.\E/) { print "Entering non CIDR parsing\n"; foreach $key (keys %$result6) { my $index; if (($index) = $key =~ m/^\Q$ipRouteNextHop\E\.(.+)/o) { #print "Found next hop ${$result6}{$key} for $index in $ke +y\n"; $routes{$index}{"nexthop"} = ${$result6}{$key} ; #|| "NULL +" or die "route assignment failed\n"; $routes{$index}{"net"} = $index; } elsif (($index) = $key =~ m/^\Q$ipRouteIfIndex\E\.(.+)/o) { $routes{$index}{"ifindex"} = ${$result6}{$key}; # || "NULL +" or die "route assignment failed\n"; #print "Found destination ${$result6}{$key} for $index in +$key\n"; } elsif (($index) = $key =~ m/^\Q$ipRouteMask\E\.(.+)/o) { $routes{$index}{"mask"} = ${$result6}{$key}; # || "NULL" o +r die "route assignment failed\n"; #print "Found mask${$result6}{$key} for $index in $key\n"; } elsif (($index) = $key =~ m/^\Q$ipRouteProto\E\.(.+)/o) { $routes{$index}{"proto"} = ${$result6}{$key}; # || "NULL" +or die "route assignment failed\n"; #print "Found mask${$result6}{$key} for $index in $key\n"; } elsif (($index) = $key =~ m/^\Q$ipRouteType\E\.(.+)/o) { $routes{$index}{"type"} = ${$result6}{$key}; # || "NULL" o +r die "route assignment failed\n"; #print "Found mask${$result6}{$key} for $index in $key\n"; } } } %{$datahash{"routes"}} = %routes;
      $session is a reference to the Net::SNMP object. $result6 is a reference to an anonymous hash containing the SNMP return values. Comment this all out and no memory leak.

      Any ideas?

        Best guess. If you just commented out the two lines that call $$session->get_entries(...), you'd still not see the memory growth. Not that that would be surprising, as the rest of the code wouldn't be doing much of anything.

        Looking at it from the other direction. Leave all the other code commented out and just call whichever of those two lines is appropriate--but do nothing with the results returned from the call--and the memory growth will return.

        $session is a reference to the Net::SNMP object

        Is that module thread safe? Are you reusing a single object to talk to multiple devices? Are you sharing one instance between threads?

        You might be able to check whether the session object is accumulating data internally long after you have finished with it by using Devel::Size on $session after each call to get_entries()


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Massive Perl Memory Leak
by Jenda (Abbot) on Jun 13, 2007 at 14:41 UTC

    1) Don't use & when calling subroutines. And if you do, DO MAKE SURE you do not use it when calling a subroutine with no parameters! Unless you know what that means.

    2) Don't use

    %{$datahash{"devinfo"}} = %devinfo;
    use
    $datahash{"devinfo"} = \%devinfo;
    instead. It's silly to force perl to flatten the %devinfo hash into a list, build a new has containing the data and then throw %devinfo out of window. Just take a reference.

    3) Don't use the global. Pass a reference to the %datahash

    for (;;) { my %datahash; ..getdevicetopoll.. ifpoll( \%datahash, $dev); print FILE dump %datahash; } ... sub ifpoll { my ($datahash, $dev) = @_; my %devinfo; my %interfaces; ## etc... ..snmp a lot of data here.. $datahash->{"devinfo"} = \%devinfo; $datahash->{"interfaces"} = \%interfaces; return; }

    Neither of those changes should affect the leak though, it must be somewhere else in the code. Are you sure you are not producing any cyclic references in the part of the code you did nto show us?

      Hi. When dealing with my own subs I always use the &name(); form. That way everything is made explicit and differentiating the subs from the builtins is easier. Wouldn't re-my'ing datahash in ifpoll break the reference back the caller's datahash?

      Oh so Perl flattens hash assignments? I thought it was smart enough to know u were assigning a hash to a hash.

      UK: The problem with validating as single threaded is that the leak is obscured in Perl's normal bloat. And the number of iterations needed to show a noticable mem increase would take a *long* time. Even with 100 threads it doesn't go into the death spiral for 20-30 minutes. But I know what ur saying.

        1) Well, if you feel better using the & go ahead, just keep in mind that &foo(); and &foo; are two very different things. The first statement calls the foo() subroutine with no parameters while the second one calls it aliasing its @_ to the current @_. That is any changes to @_ made within foo() will affect the current subroutine!

        2) No, the my affects only the variable, no its contents. So the ifpoll() receives a reference to the caller's %datahash and assigns the reference to its own variable $datahash. The my itself doesn't affect the %datahash in any way.

        3) Per has to flatten hash assignments. It might make some optimizations like preallocating the right number of buckets, but it has to create a new hash anc copy the values. So that changing one hash doesn't affect the other.
        The %newhash = %oldhash; makes a copy, not another name for the same "object".