Re: Massive Perl Memory Leak
by halley (Prior) on Jun 11, 2007 at 17:09 UTC
|
Don't use local %datahash, use my %datahash.
Unless you really really understand what the difference is between these two keywords, 99.99% of the time, you want my.
The local keyword is a way to temporarily shadow the original value so changes aren't kept, and then automatically restore that original value later.
While I see that you're using the structure in both routines, it seems a bit odd that you're trying to work with the global %datahash from both routines (and others). I think you might want to check out the use of references, so you can work with data structures more generally instead of having multiple routines try to protect their values and effects on these global variables.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] [select] |
|
|
Thanks for getting back to me. I'm using the local %datahash
construct so I can avoid gotchas with passing references (but it seems I've run into another gothcha somewhere else). %datahash
is essentially just a thread global and the local is there to
automatically garbage collect it as the forever loop restarts. I can't
use my because the variable wouldn't be visible to the descendant
subroutines. Now correct me if I'm wrong, but isn't %datahash garbage
collected the moment the forever loop loops? It's not creating *new* %datahashes
and keeping the *old* %datahash's stashed away right? I'm manually
walk deleting everything out of %datahash at the end of the loop
so either way it shouldn't be building up memory.
| [reply] |
|
|
$foo = 5;
sub do_something { print $foo, $/; }
sub main
{
do_something();
{
local $foo = 6;
do_something();
}
do_something();
}
main();
The above code is a typical use of local. Internally, it is conceptually equivalent to the following code (shown for the innermost block only). You'd get 5, 6, 5 printed.
{
my $unnamed_backup_of_old_value_of_foo = $foo;
$foo = 6;
do_something();
$foo = $unnamed_backup_of_old_value_of_foo;
}
The actual storage of $foo is the first statement. It is what gets changed whenever you see assignments being done.
This of course can get very big and hairy with a huge hash of data to "backup" and "restore" in the equivalent %unnamed_backup_of_old_value_of_datahash. Also, deleting items from the local hash would not have much bearing, since the whole hash will just get tossed out and restored from the backup.
If you're doing local from multiple threads, well, you can see where your memory is going. Secondly, I can quite easily imagine a race condition where your main shared global hash is getting "restored" in the wrong order, leading to pandemonium.
-- [ e d @ h a l l e y . c c ]
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
I can't answer the garbage collection issue except to say what you probably already know: Perl deletes objects' memory when it is no longer referenced.
However using global variables to "pass" information between subs is bad, bad, bad! Instead pass a reference to the descendant subs so it is clear where information is being used and possibly modified. If you find you are passing a lot of parameters consider using light weight object oriented techniques by simply blessing a hash into the current package then calling the descendant subs on the object:
my $obj = bless {datahash => \%datahash, ...};
$obj->sub1 ();
$obj->sub2 ();
...
sub sub1 {
my $self = shift;
$self->sub3 (wibble => 3);
}
...
sub sub3 {
my ($self, %params) = @_;
for my $key (keys %{$self->{datahash}}) {
my $value = $self->{datahash}{$key} * $params->{wibble};
...
}
}
and if you have trouble with getting references going ask! Writing nasty code to avoid learning a new and valuable technique will not reward you over the long run, and by the sound of it not even in the short run. You may find References quick reference helps.
DWIM is Perl's answer to Gödel
| [reply] [d/l] |
|
|
Do any of u see any problem with my hash usage? Specifcally:
%{$datahash{"devinfo"}} = %devinfo;
I'm curious as to how the deep nature of %devinfo is copied over
to the %datahash branch. As in do any refs to the original var survive?
Or is that a 100% clean copy with no strings attached.
Would either of these be equivalent or better syntax? :
$datahash{"devinfo"} = \%devinfo;
$datahash{"devinfo"} = %devinfo; # this must be wrong
| [reply] [d/l] [select] |
|
|
use strict;
use warnings;
use Data::Dump::Streamer;
my %devinfo = (1 => {a => 'apple', b => 'orange'});
my %datahash = (devinfo => {});
%{$datahash{"devinfo"}} = %devinfo;
Dump (\%devinfo, \%datahash, $datahash{devinfo});
Prints:
$HASH1 = { 1 => {
a => 'apple',
b => 'orange'
} };
$HASH2 = { devinfo => 'A: $HASH3' };
$HASH3 = { 1 => $HASH1->{1} };
alias_hv(%$HASH2, 'devinfo', $HASH3);
The copy is a shallow copy. If any of the values of %devinfo are references then the copy simply duplicates the reference. You may find Storable's dclone helps if you are looking for a clone of the data.
DWIM is Perl's answer to Gödel
| [reply] [d/l] [select] |
Re: Massive Perl Memory Leak
by Joost (Canon) on Jun 11, 2007 at 22:25 UTC
|
Well, looking at the pseudocode it all depends on how much data you're putting in %datahash.
100 threads taking up 2 Gb is "only" 20 Mb per thread. the thread-local data in %datahash alone could account for that. As noted above, you should not count on perl sharing any data between threads (not even internally) unless you explicitly arrange for that. In other words, perl threads behave much more like separate processes than most other thread implementations do.
It's easy enough to check if that's the problem: reduce the number of threads. If the amount of memory taken is stable relative to the number of threads there probably isn't a leak, you're just suffering from perl's memory-hungry approach to problem solving.
update: I just want to note that using threads in this program doesn't seem to be necessary (assuming you're using threads to run multiple SNMP connections in parallel). It's perfectly possible (and probably more efficient) to do that in a single thread. For instance, you could use non-blocking IO and select() instead. I believe that Net::SNMP even provides a callback/non-blocking interface. See also POE::Component::SNMP.
| [reply] |
|
|
The %datahash hash only accumulates 50-150K depending on the device
and it's blown away at each iteration. The script saves very little
data persistently. I monitor those variables for size and they don't
grow out of control. Watching the script with top shows the memory
usage grow geometrically. Slow at first, then going up 1-5 meg every
few seconds. The memory usage accelerates.
I have a previous version of the script that doesn't have the memory leak.
The only real difference is the use of a central datahash to keep
everything. The old script just directly prints everything. That's
why I think this is the culprit.
I wrote a test script that used Padwalker to check on the my'd variables
with peek_my and then Data::Dump that hash. That showed something
very interesting. After I assigned the independent hashes into the datahash
(%{$datahash{"branch"}} = %branchdata;) the dump showed both hashes were
refering to the same data. It wasn't a pure copy. Like:
do {
my $a = {
"%cdp" => {
"1.4" => {
id => "switch",
ip => "1.2.3.4",
platform => "WS-C6509",
}
},
"%datahash" => {
cdp => { "1.4" => 'fix', "1.5" => 'fix' },
},
};
$a->{"%datahash"}{cdp}{"1.4"} = $a->{"%cdp"}{"1.4"};
};
Does this shed any light on anything? Thanks. | [reply] [d/l] |
|
|
%a = ( 1, {a=>b=>c=>d=>}, 2 =>[ 1..5 ] );;
%{ $b{ copy } } = %a;;
print Dumper \%a, \%b;;
$VAR1 = {
'1' => {
'c' => 'd',
'a' => 'b'
},
'2' => [
1,
2,
3,
4,
5
]
};
$VAR2 = {
'copy' => {
'1' => $VAR1->{'1'},
'2' => $VAR1->{'2'}
}
};
To deep copy compound structures, you need to use Clone or (as someone else advised earlier) Storable::dclone().
But whether that has anything to do with your memory growth is impossible to tell from the code shown. If both of these structures are lexical and being garbage collected, then that (shallow copying) should not be the cause the symptoms you are describing.
It sounds like you may be creating some kind of a circular reference somewhere. This could cause the kind of accelerating memory growth you mention. Maybe. But again, it's just guesswork without seeing the code.
And the problem does not seem to be related to your use of threads. Though it is obviously exaggerated by there being 100 times as many 'programs' all doing the same bad thing--whatever that is.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
I have a previous version of the script that doesn't have the memory leak. The only real difference is the use of a central datahash to keep everything. The old script just directly prints everything.
Can you show the parts of your non-leaking script that are equivalent to what you show in your initial question?
| [reply] |
Re: Massive Perl Memory Leak
by BrowserUk (Patriarch) on Jun 11, 2007 at 17:48 UTC
|
Working code is always preferable to pseudo-code.
But...does it make any difference if you don't use Data::Dump::dump()? Eg. Just print the hash rather than dumping it. My though is that is may not be thread-safe and is accumulating internal data (say, for circular reference detection) that isn't being cleaned up or is getting replicated.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
(
"index",
{
baseip => "1.2.3.4",
deviceid => "81F1FD9964973BB63B3446881F901AF5",
sysname => "router",
},
"routes",
{
"0.0.0.0.0.0.0.0.2.3.4.5" => {
ifindex => 0,
mask => "0.0.0.0",
net => "0.0.0.0",
nexthop => "2.3.4.5",
proto => 3,
type => 4,
},
},
"interfaces",
{
1 => {
ARP => 3,
ifAdminStatus => 1,
ifAlias => "",
ifDescr => "FastEthernet0/0",
ifIndex => 1,
ifOperStatus => 1,
ifSpeed => 100_000_000,
ifType => "ethernetCsmacd",
},
I found the page that talked about the hash of hashes problem. It's
http://www.chat11.com/Perl_Memory_Usage . Ring any bells? | [reply] [d/l] |
|
|
he hash can't be printed otherwise. It's a deep tree and just print'ing it would just give the top level keys...
I meant avoid the use of dump just temporarily to see if it is the cause of the memory growth.
If avoiding it fixed the immediate problem, then you could look for another way of recursively printing deep data structures that didn't suffer the same problem. Maybe Data::Dumper, or Data::Dump::Streamer or even rmap with a little logic of your own.
But first you need to track down the source of the problem. You could equally just comment out the print altogether.
Is %datahash shared?
What's in the hash doesn't matter.
There is no logic I know of for using local rather than my in threads.
Without sight of the full program (stick it on your scratchpad and /msg me if you don't want to post it), or a cut down but working version that demonstrates the same symptoms, it's really impossible to offer much in the way of help.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
Re: Massive Perl Memory Leak
by perrin (Chancellor) on Jun 11, 2007 at 18:54 UTC
|
I don't know much about how thread data gets garbage collected. I do know they use a lot of memory though, since every time you make a new thread it copies all of the data structures in your perl interpreter. They are not shared, except for the ones you specifically mark.
In addition, the way that memory allocated for perl structures gets returned when they go out of scope is not always intuitive. The memory typically stays allocated after the variable goes out of scope. But I don't know how this interacts with threads.
Were you using local %datahash because you thought you needed that to prevent it from being shared by threads? You can actually just use our %datahash, since nothing is shared. Each thread will have a separate copy automatically.
| [reply] |
|
|
I was using local %datahash; as a cheap way of reinitializing the
variable at each loop. Simply doing %datahash = (); would be functionally
identical for my requirements.
| [reply] |
Re: Massive Perl Memory Leak
by Zen (Deacon) on Jun 11, 2007 at 19:49 UTC
|
I checked out the memory leak article you referenced. It gives no code nor evidence of any kind that it is a perl problem. That being said, there may or may not be a memory leak in this situation.
In the short time I've been a member and seen the trepidation surrounding perl apps, I've never actually seen any "perl has a memory leak and it's ruining my program" posts substantiated. Mostly just unfounded suspicions that could be traced towards what the other monks suggested- inconvenient scoping and non-threadsafe modules that could be keeping references to your hash, thereby foiling the garbage collector. | [reply] |
Re: Massive Perl Memory Leak
by BrowserUk (Patriarch) on Jun 12, 2007 at 23:45 UTC
|
I have another suggestion. I should have thought to mention it earlier.
Modify the program to call the thread code directly from main and comment out the thread creation step.
If the 'leak' is associated with the threads it should 'go away' completely, but my bet is that it will still happen--only 100 times more slowly. Once you've fixed the leak in the non-threaded version, the threaded version won't leak either.
The simplest and best way to debug threaded code is to make the thread functions work indivually as non-threaded functions first. And only once you are happy they are working correctly, spawn copies.
This is so ingrained in me, that it didn't cross my mind to mention it before now.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
## all in ifpoll
local %routes = ();
my $result6;
## device can have CIDR or non-CIDR routing
$result6 = $$session->get_entries(-columns => [$ipCidrRouteIfIndex, $i
+pCidrRouteProto, $ipCidrRouteType ]);
$result6 = $$session->get_entries(-columns => [$ipRouteNextHop, $ipRou
+teIfIndex, $ipRouteMask, $ipRouteProto, $ipRouteType]) unless %$resul
+t6;
if (!defined($result6)) {
printf(STDERR "ERROR(routes): %s %s.\n", $devarg, $$session->err
+or);
}
local $testkey = each %$result6;
if ($testkey =~ m/^\Q1.3.6.1.2.1.4.24.4.1.\E/o) {
print "Entering CIDR parsing\n";
foreach $key (keys %$result6) {
if ($key =~ m/^\Q$ipCidrRouteProto\E/o) {
my @temp = $key =~ m/^\Q$ipCidrRouteProto\E\.(\d{1,3}\.\d{
+1,3}\.\d{1,3}\.\d{1,3})\.(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.0\.(\d
+{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/o;
my $id = join ".", @temp;
$routes{$id}{"proto"} = ${$result6}{$key};
}
elsif ($key =~ m/^\Q$ipCidrRouteType\E/o) {
my @temp = $key =~ m/^\Q$ipCidrRouteType\E\.(\d{1,3}\.\d{1
+,3}\.\d{1,3}\.\d{1,3})\.(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.0\.(\d{
+1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/o;
my $id = join ".", @temp;
$routes{$id}{"type"} = ${$result6}{$key};
}
elsif ($key =~ m/^\Q$ipCidrRouteIfIndex\E/o) {
# dest
+network dest mask
+ next hop
my @temp = $key =~ m/^\Q$ipCidrRouteIfIndex\E\.(\d{1,3}\.\
+d{1,3}\.\d{1,3}\.\d{1,3})\.(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.0\.(
+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})/o;
my $id = join ".", @temp;
$routes{$id}{"net"} = $temp[0];
$routes{$id}{"mask"} = $temp[1];
$routes{$id}{"nexthop"} = $temp[2];
$routes{$id}{"ifindex"} = ${$result6}{$key};
}
}
}
elsif ($testkey =~ m/^\Q1.3.6.1.2.1.4.21.1.\E/) {
print "Entering non CIDR parsing\n";
foreach $key (keys %$result6) {
my $index;
if (($index) = $key =~ m/^\Q$ipRouteNextHop\E\.(.+)/o) {
#print "Found next hop ${$result6}{$key} for $index in $ke
+y\n";
$routes{$index}{"nexthop"} = ${$result6}{$key} ; #|| "NULL
+" or die "route assignment failed\n";
$routes{$index}{"net"} = $index;
}
elsif (($index) = $key =~ m/^\Q$ipRouteIfIndex\E\.(.+)/o) {
$routes{$index}{"ifindex"} = ${$result6}{$key}; # || "NULL
+" or die "route assignment failed\n";
#print "Found destination ${$result6}{$key} for $index in
+$key\n";
}
elsif (($index) = $key =~ m/^\Q$ipRouteMask\E\.(.+)/o) {
$routes{$index}{"mask"} = ${$result6}{$key}; # || "NULL" o
+r die "route assignment failed\n";
#print "Found mask${$result6}{$key} for $index in $key\n";
}
elsif (($index) = $key =~ m/^\Q$ipRouteProto\E\.(.+)/o) {
$routes{$index}{"proto"} = ${$result6}{$key}; # || "NULL"
+or die "route assignment failed\n";
#print "Found mask${$result6}{$key} for $index in $key\n";
}
elsif (($index) = $key =~ m/^\Q$ipRouteType\E\.(.+)/o) {
$routes{$index}{"type"} = ${$result6}{$key}; # || "NULL" o
+r die "route assignment failed\n";
#print "Found mask${$result6}{$key} for $index in $key\n";
}
}
}
%{$datahash{"routes"}} = %routes;
$session is a reference to the Net::SNMP object. $result6 is a
reference to an anonymous hash containing the SNMP return values.
Comment this all out and no memory leak.
Any ideas? | [reply] [d/l] |
|
|
Best guess. If you just commented out the two lines that call $$session->get_entries(...), you'd still not see the memory growth. Not that that would be surprising, as the rest of the code wouldn't be doing much of anything.
Looking at it from the other direction. Leave all the other code commented out and just call whichever of those two lines is appropriate--but do nothing with the results returned from the call--and the memory growth will return.
$session is a reference to the Net::SNMP object
Is that module thread safe? Are you reusing a single object to talk to multiple devices? Are you sharing one instance between threads?
You might be able to check whether the session object is accumulating data internally long after you have finished with it by using Devel::Size on $session after each call to get_entries()
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
|
Re: Massive Perl Memory Leak
by Jenda (Abbot) on Jun 13, 2007 at 14:41 UTC
|
1) Don't use & when calling subroutines. And if you do, DO MAKE SURE you do not use it when calling a subroutine with no parameters! Unless you know what that means.
2) Don't use
%{$datahash{"devinfo"}} = %devinfo;
use
$datahash{"devinfo"} = \%devinfo;
instead. It's silly to force perl to flatten the %devinfo hash into a list, build a new has containing the data and then throw %devinfo out of window. Just take a reference.
3) Don't use the global. Pass a reference to the %datahash
for (;;) {
my %datahash;
..getdevicetopoll..
ifpoll( \%datahash, $dev);
print FILE dump %datahash;
}
...
sub ifpoll {
my ($datahash, $dev) = @_;
my %devinfo;
my %interfaces; ## etc...
..snmp a lot of data here..
$datahash->{"devinfo"} = \%devinfo;
$datahash->{"interfaces"} = \%interfaces;
return;
}
Neither of those changes should affect the leak though, it must be somewhere else in the code. Are you sure you are not producing any cyclic references in the part of the code you did nto show us?
| [reply] [d/l] [select] |
|
|
Hi. When dealing with my own subs I always use the &name(); form. That way
everything is made explicit and differentiating the subs from the builtins is easier.
Wouldn't re-my'ing datahash in ifpoll break the reference back the caller's datahash?
Oh so Perl flattens hash assignments? I thought it was smart enough to know u were assigning
a hash to a hash.
UK: The problem with validating as single threaded is that the leak is obscured
in Perl's normal bloat. And the number of iterations needed to show a noticable mem
increase would take a *long* time. Even with 100 threads it doesn't go into the
death spiral for 20-30 minutes. But I know what ur saying.
| [reply] |
|
|
1) Well, if you feel better using the & go ahead, just keep in mind that &foo(); and &foo; are two very different things. The first statement calls the foo() subroutine with no parameters while the second one calls it aliasing its @_ to the current @_. That is any changes to @_ made within foo() will affect the current subroutine!
2) No, the my affects only the variable, no its contents. So the ifpoll() receives a reference to the caller's %datahash and assigns the reference to its own variable $datahash. The my itself doesn't affect the %datahash in any way.
3) Per has to flatten hash assignments. It might make some optimizations like preallocating the right number of buckets, but it has to create a new hash anc copy the values. So that changing one hash doesn't affect the other.
The %newhash = %oldhash; makes a copy, not another name for the same "object".
| [reply] [d/l] [select] |
|
|
|
|