mikfire has asked for the wisdom of the Perl Monks concerning the following question:

I have been working on some code to handle to kill a run-away process. One of the tricks to this solution is that the run-away process likes to fill up the process table. This means my assassin cannot rely on forking, system calls, etc.

To get around this, I downloaded and installed the Solaris::Procfs module and figured out how to use it. Everything looked fine and I started the daemon in a test mode.

Imagine my surprise when I checked the next morning and the process had a size of 53Mb and an RSS of 53Mb. Obviously, I have a bad memory leak.

I have looked over my code and have cleaned the obvious problems out. I fear the leak may actually be in the Solaris::Procfs module, which uses a fair amount of XS.

My questions are:

I have thought about running the daemon and having it spawn children to do its job. If a child cannot be spawned, the daemon will handle the job itself. This will not fix the leak, but should slow it down.

The code should ( hopefully ) be beneath the readmore tag.
TIA,
mikfire

#!/os/dist/bin/perl -w use Solaris::Procfs qw/:procfiles/; # Get a ps -auxw w/o shelling use Net::SNPP; # Messages to pagers use Sys::Syslog; # All other messages use Data::Dumper; my $killing; my $DEBUG = 1; #--- # There is a very nasty race condition I am trying to avoid. It coul +d be # possible that, between the call to get_procfs and the kill more pro +cesses # could be spawned. We need to make sure we kill them all. Thus, th +e # killing flag is set and we will kill until there are no more to be +killed. # # The loop has three basic portions: # 1. If the killing flag isn't set and the number of running ypserv +'s # is between 1 and 8, we sleep and loop. # 2. If there are no ypserv's running, we fork a child and exec a # ypstart in the child. When the child is reaped, we return to +the # top of the loop # 3. Otherwise, we kill everything named ypserv. I sort on the PPI +D, # hoping to kill the one spawning everybody else first. #--- while ( 1 ) { my %procs = (); get_procs(\%procs); my $procnum = keys %procs; #--- # Nothing wrong, go back to sleep #--- unless ( $killing || $procnum > 8 ) { LogIt( 'Nothing to kill', 'info' ); sleep 180; next; } #--- # If no ypservs are found, then we likely killed them all and it i +s time # to ( hopefully ) fork a new ypserv process. #--- unless ( $procnum ) { #--- # We will do the fork and exec by hand. That way we can flag erro +rs # and attempt to be smart. #--- my ($kid,$dead,$looped); if ( $DEBUG ) { LogIt( 'Would have forked ypstart', 'debug' ); sleep 300; next; } if ( $kid = fork ) { do { select(undef,undef,undef,0.5); $looped++; $dead = waitpid(-1,1); } until $dead == -1 || $looped > 20; if ( $looped > 20 ) { LogIt( "Could not reap ypstart", 'err' ); next; } $killing = 0; } elsif ( defined( $kid ) ) { exec "/usr/lib/netsvc/yp/ypstart"; } else { LogIt("Couldn't fork: $!",'err'); } } #--- # Start with the cement shoes and overcoats. I try to get the ( I + hope ) # ring leader first - which will have, I assume, a PPID of 1. #--- if ( $DEBUG ) { LogIt( "$procnum ypservs running, would have waxed $total", 'debug +' ); sleep 300; next; } $killing = 1; $total = kill 9, sort { $procs{$a}{PPID} <=> $procs{$b}{PPID} } ke +ys %procs; LogIt( "$procnum ypservs running, $total waxed", 'warning' ); } #--- # All the functions for Procfs seem to rely upon the pid. The only wa +y I can # think of getting this information is to grope /proc and get the info +rmation # for each directory I find. #--- sub get_procs { my $ref = shift; print localtime, "opening procfs\n"; opendir PIDS, "/proc" or die "Coudln't open /proc: $!"; while ( $pid = readdir PIDS ) { next if ( $pid =~ /^\.\.?\z/ ); my $info = psinfo $pid; unless ( defined( $info ) && ref $info eq 'HASH' ) { warn "Couldn't grope pid $pid: $!\n"; next; } #--- # Do nothing unless this is a ypserv #--- next unless ( $info->{pr_fname} eq 'ypserv' ); $ref->{PID} = $pid; $ref->{PPID} = $info->{pp_ppid}; $info = undef; } closedir PIDS; } #--- # Stupid logging function - sends a page for anything more important t +han info # and syslogs them all #--- sub LogIt { my ($message, $level ) = @_; my $paged = 0; $level ||= 'daemon'; if ( $level eq 'debug' ) { my $snpp = Net::SNPP->new('snpphost'); $paged = $snpp->send( Pager => 'mik', Message => $message, ); return 1; } unless ( $level eq 'info' ) { my $snpp = Net::SNPP->new('snpphost'); $paged = $snpp->send( Pager => 'sysadm', Message => $message, ); } openlog('Sacco', 'pid', 'daemon'); syslog( $level, "$message\n" ); syslog( 'err', "Paging failed\n" ) unless ( $level eq 'info' || $p +aged ); closelog(); }

Replies are listed 'Best First'.
Re: Memory leaks
by jeroenes (Priest) on Feb 05, 2001 at 21:04 UTC
      I hadn't disconnected this from the terminal yet - I had diagnostic output going to STDOUT. I do not think this is quite like your problem.

      mikfire

Re: Memory leaks
by goldclaw (Scribe) on Feb 06, 2001 at 02:36 UTC
    I don't have a Solaris box at home so I can't test this, but I can give you a few tips though.

    If Procfs is too blame, try creating a small test program that only calls psinfo, say a few thosuand times, and see if it grows out of proportion. If it does, you can pepper the Procfs.xs file(in the psinfo2hash function) with printf's. Make sure that all SV's stored in the hash have a reference count of 1. (Im a bit suspicious about the use of av_push in there. Anyone know if it increments the ref. count? ).

    And of cource, take a look at the perlguts man page. The perlxs man page might be helpfull as well, if only to understand the "magic" statements in Procfs.xs

    goldclaw

Re: Memory leaks
by goldclaw (Scribe) on Feb 07, 2001 at 16:16 UTC
    Ok, I got bored at work, took my own advice, and started debugging this thing. There are massive memory leaks in there. The author of the module didn't understand how to get a correct reference count. So instead of bother with the headaches caused by a to low reference count, he instead increased the ref count on everything returned, so that nothing would ever be freed. I have gone through it, and cleened up the stuff that were getting a to low ref count, and removed the extra increase on everything returned. Everything seems to be OK now. Ill submit my updates to the author, in the meantime you can grab the fixed xs file from:

    Procfs.xs

    goldclaw

Re: Memory leaks
by petral (Curate) on Feb 06, 2001 at 07:22 UTC
    $ref->{PID} = $pid; $ref->{PPID} = $info->{pp_ppid};
    Do you mean something more like: $ref->{$pid} = $info->{pp_ppid}; That won't solve your problem. I just don't understand how you get anything other than 2 keys with this code. (Of course, this probably happenned in your cut and paste and works perfectly well in the running program.)

    p
      I only need the two bits of information - the PID and the parent PID ( ie, PPID ). If you look at the portion where the comment mentions something about "cement shoes and overcoats", you will notice I am sorting by the PPID in a vain attempt to kill the main process.

      mikfire

        Fine, that's a good q&d thing to do and your code probably does it. The code you posted just got mangled so there is no key between $proc{} and {PPID}.
        $total = kill 9, sort { $procs{$a}{PPID} <=> $procs{$b}{PPID} } k +eys %procs; /\ /\
        that's all I was mentioning.

        p