in reply to Re: Memory management with long running scripts
in thread Memory management with long running scripts

Along the lines of your recommendation ..

Now that I've made some progress in reducing the severity of the memory growth I'm working on making a forking version of my scripts such that all of the prep work is done in the parent process (ie read configuration etc) and the real work is do in child processes and the parent processes will reap the children after a predetermined amount of time or based on memory usage.

  • Comment on Re^2: Memory management with long running scripts

Replies are listed 'Best First'.
Re^3: Memory management with long running scripts
by flexvault (Monsignor) on Aug 08, 2012 at 15:20 UTC

    jamesrleu,

    Sounds like some of your frustration has been alleviated -- good!

    Here's some code you can put in the parent to test the expanding size of the children. You may want to verify that VSZ and RSS are the same for your system. Use 'man ps' and it should tell you the definitions.

    ... my ($mem1,$mem2) = &Display_Mem_Usage($child[$no],$NAME,0); if ( $mem1 > 0 ) { my $diff1 = $mem1 - $pmem1; my $diff2 = $mem2 - $pmem2; if ( $diff1 > $max_virtual ) { ... } # kill the child elsif ( $diff2 > $max_real ) { ... } # kill the child } ... sub Display_Mem_Usage { # VSZ is size in KBytes of the virtual memory ( VSZ * 1024 ) # RSS is size in pages of real memory ( 1024 * RSS ) my $cpid = shift; my $name = shift; my $from = shift; ## Not used here, but in some scr +ipts my $var = ""; my $fh; if ( ! ( kill 0 => $cpid ) ) ## Check that pid is active { return ( -1, -1 ); } my $arg = qq| -o "vsz rssize" -p $cpid|; ## make sure you specify the full path to 'ps' command open ( $fh, "-|", "/bin/ps $arg" ) or die "Prefork: Not open \'$ar +g\': $!"; while (<$fh>) { $var .= $_; } close $fh; my $rno = my @ref = split(/\n/,$var); if ( $rno < 2 ) { return ( -1, -1 ); } my $info = join(" ", split " ", $ref[1]); my ($vmem,$rmem) = ( split(/\ /,$info) ); return ( $vmem , $rmem ); }

    If you decide to use this code, only call the subroutine from the parent. In AIX it worked for both the parent and children, but in Linux it would hang after 4-5 hours. Must have some type of race condition, but you don't really need to call it from the children. To use it properly you call the sub after creating the child and save the returned sizes ($pmem1/2) in an array or hash. This way you can track the children and make sure they don't exceed your predetermined max sizes.

    For killing the children, I usually send 'ABRT' first, and then if the child still exists I send '-9' on the second pass. On the 3rd pass, if the child still exists, I email the system admin, and shutdown and restart the whole process. It has never happened so far, but you have to prepare for worst cases.

    Good Luck...Ed

    "Well done is better than well said." - Benjamin Franklin