in reply to Memory management with long running scripts

jamesrleu,

First, Perl 5.8.8 is a good version to be stuck with. I have several production systems using 5.8.8 for years without difficulty.

Since you say that the "long running perl scripts (daemons)" run for at least days, why not add code to the script to restart itself after 24 hours have elapsed. Immediately that takes the pressure off you.

Another approach is to restart at a specific time ( like 3:22AM ). Pick a time when you have the least usage. For this we use crontab to schedule at 3:22 each day:

touch "/var/RestartPerlDaemons"
and at 3:32 each day
rm "/var/RestartPerlDaemons"
During the 10 minutes we do some cleanup, but if you don't need to do that then just do the remove at 3:23. Obviously the Perl scripts have to check for the existence of the file and close down, and then not restart until the file is removed. Don't check on every cycle, but use time to check every 10 seconds of so ( saves on stats ).

Memory leaks are among the most difficult problems to isolate. Others have given some good ideas to find the leaks, but you sound very frustrated by the situation.

In a *nix forked environment, after some specified time, the children exit and the parent forks a new child. To give you some idea of the variables, on AIX the children exit after 8 hours, in some Linux systems it ranges from 2 hours to 12 hours. But to restart a clean child takes seconds.

Perl depends on the system libraries, and if they have 'leaks', Perl is going to have leaks. Since you can't change your system, you need to minimize the problem for you.

Good Luck!

"Well done is better than well said." - Benjamin Franklin

Replies are listed 'Best First'.
Re^2: Memory management with long running scripts
by jamesrleu (Novice) on Aug 08, 2012 at 13:28 UTC

    Along the lines of your recommendation ..

    Now that I've made some progress in reducing the severity of the memory growth I'm working on making a forking version of my scripts such that all of the prep work is done in the parent process (ie read configuration etc) and the real work is do in child processes and the parent processes will reap the children after a predetermined amount of time or based on memory usage.

      jamesrleu,

      Sounds like some of your frustration has been alleviated -- good!

      Here's some code you can put in the parent to test the expanding size of the children. You may want to verify that VSZ and RSS are the same for your system. Use 'man ps' and it should tell you the definitions.

      ... my ($mem1,$mem2) = &Display_Mem_Usage($child[$no],$NAME,0); if ( $mem1 > 0 ) { my $diff1 = $mem1 - $pmem1; my $diff2 = $mem2 - $pmem2; if ( $diff1 > $max_virtual ) { ... } # kill the child elsif ( $diff2 > $max_real ) { ... } # kill the child } ... sub Display_Mem_Usage { # VSZ is size in KBytes of the virtual memory ( VSZ * 1024 ) # RSS is size in pages of real memory ( 1024 * RSS ) my $cpid = shift; my $name = shift; my $from = shift; ## Not used here, but in some scr +ipts my $var = ""; my $fh; if ( ! ( kill 0 => $cpid ) ) ## Check that pid is active { return ( -1, -1 ); } my $arg = qq| -o "vsz rssize" -p $cpid|; ## make sure you specify the full path to 'ps' command open ( $fh, "-|", "/bin/ps $arg" ) or die "Prefork: Not open \'$ar +g\': $!"; while (<$fh>) { $var .= $_; } close $fh; my $rno = my @ref = split(/\n/,$var); if ( $rno < 2 ) { return ( -1, -1 ); } my $info = join(" ", split " ", $ref[1]); my ($vmem,$rmem) = ( split(/\ /,$info) ); return ( $vmem , $rmem ); }

      If you decide to use this code, only call the subroutine from the parent. In AIX it worked for both the parent and children, but in Linux it would hang after 4-5 hours. Must have some type of race condition, but you don't really need to call it from the children. To use it properly you call the sub after creating the child and save the returned sizes ($pmem1/2) in an array or hash. This way you can track the children and make sure they don't exceed your predetermined max sizes.

      For killing the children, I usually send 'ABRT' first, and then if the child still exists I send '-9' on the second pass. On the 3rd pass, if the child still exists, I email the system admin, and shutdown and restart the whole process. It has never happened so far, but you have to prepare for worst cases.

      Good Luck...Ed

      "Well done is better than well said." - Benjamin Franklin