jamesrleu has asked for the wisdom of the Perl Monks concerning the following question:

Exec Summary
------------
I'm struggling with memory management in long running perl scripts (daemons) the lifetime of these scripts are measured in days. The scripts run on servers running Linux.

Background
----------
For complicated reasons I'm stuck on perl 5.8.8 and linux kernel 2.4.31. The scripts are Event based daemons that use timer and io Event callbacks. There are few if any XS based modules in use (besides those in the core perl distrubution). Inside of the callbacks I allocate anonymous array and hashes that are used to pass data to the depths of the script. I utilize Data::Dumper and eval() to persist some of the structures to disk so the scripts can pick up where they left off after a restart. The scripts continue to consume memory over time until finally they have to be restarted to free up memory.

Troubleshooting so far
----------------------
I've used Devel::Leak, Devel::Mallinfo, Devel::Cycle to try to get a better understanding of the problem. The mallinfo shows that there is plenty of free memory in the process, but yet the process continues to allocated more memory (ie hints at memory fragmentation). Devel::Leak shows that there are allocations that are not being freed, Devel::Cycle does not show any internal cycles in the array/hash refs.

Questions
---------
0) What additional information can I provide to help others help me?
1) Are there any known issues with perl 5.8.8 that would lend itself to this issue?
2) Are there known issues with versions of Event that would cause this issue?
3) Is there a fundamental difference in how perl allocates memory for anonymous arrays/hashes vs @arrays and %hashes? (ie stack vs heap?) that would affect memory management?
4) Can the use of eval() cause this sort of issue?
5) Are there different 'tools' I could use to dig deeper?
6) Are there any red flags in my description that are 'bad practices' for these types of scripts?

Thank you for your help
  • Comment on Memory management with long running scripts

Replies are listed 'Best First'.
Re: Memory management with long running scripts
by davido (Cardinal) on Jul 21, 2012 at 16:00 UTC

    You didn't list your XS modules. But if I were hunting for the memory leak and Devel::Leak wasn't helpful, I might start looking at bug reports for those XS modules (both current, and resolved if I'm using an older version of a module). Additionally if I'm looking at an older module I would also check the Changes file for that module to see if there was something fixed in a more recent version. ...of course if possible it would be a good idea to make sure all the modules are updated to their most current versions, especially with XS, though with such an old Perl you would want to test interoperability before upgrading in production.


    Dave

Re: Memory management with long running scripts
by Marshall (Canon) on Jul 21, 2012 at 16:14 UTC
    (2) Are there known issues with versions of Event that would cause this issue?

    I looked at Event:

    * $event->remove This removes an event object from the event-loop. Note that the object itself is not destroyed and freed. It is merely disabled and you can later re-enable it by calling $event->add.
    Maybe there is an issue with how these Event's are handled?

    (4) Can the use of eval() cause this sort of issue?

    I'm not sure how you are using eval. If you are eval'ing some ever increasing thing - that would take more memory.

    (3) Is there a fundamental difference in how perl allocates memory for anonymous arrays/hashes vs @arrays and %hashes? (ie stack vs heap?) that would affect memory management?

    To my knowledge no. Perl does not "free" memory back to the OS, once it has it, it is not returned. There is a big difference in allowing Perl to reuse the memory that it already has for itself (e.g. "destroying Perl objects, etc).

      To my knowledge no. Perl does not "free" memory back to the OS, once it has it, it is not returned. There is a big difference in allowing Perl to reuse the memory that it already has for itself (e.g. "destroying Perl objects, etc).

      Not true on Windows Perl, on unix perl, I dont know. I've heard on PerlMonks that unix malloc uses one continuous memory block for malloced memory that grows upwards sequentially (sbrk style). MS C Lib/Windows malloc uses different non contiguous pools and allocations over a certain size basically go straight to the VM paging system (mmap style) and get random blocks of paging memory. According to p5p, until this or last month, compiled OPs were not freeable or something similar. Weak references used to leak in 5.10, and I think it was fixed in 5.10.1 (personally ran into that). So there is a realistic chance your leak in Perl and not XS modules. 5.8 is very old.

      Update, weak ref leak is https://rt.perl.org/rt3/Public/Bug/Display.html?id=56908.
        I am certainly willing to learn something new!
        Can you make a Perl process on Windows that allocates and uses a large amount of physical memory, say 500 MB worth. And then show that Perl "released it" to the OS? Without the Perl process termination? I am running 5.10.1 Perl.
Re: Memory management with long running scripts
by Athanasius (Archbishop) on Jul 21, 2012 at 16:21 UTC
    Are there different 'tools' I could use to dig deeper?

    Have you looked at Test::LeakTrace? Dependencies are listed as “Perl 5.8.1 or later, and a C compiler.” so 5.8.8 should be fine. See the recent thread Leaking 0-length ARRAYs!?.

    HTH,

    Athanasius <°(((><contra mundum

      Thank you Athanasius. By using Test::LeakTrace I was able to find some old modules that were leaking:

      • Log::Log4Perl
      • DBI
      • Log::Dispatch
      • Log::Dispatch::FileRotate

      So I've made some progress. The scripts used to need to be restart ~10 times per day, now they are down to <5.

        I kinda doubt those modules are actually leaking all that much
Re: Memory management with long running scripts
by flexvault (Monsignor) on Jul 22, 2012 at 14:00 UTC

    jamesrleu,

    First, Perl 5.8.8 is a good version to be stuck with. I have several production systems using 5.8.8 for years without difficulty.

    Since you say that the "long running perl scripts (daemons)" run for at least days, why not add code to the script to restart itself after 24 hours have elapsed. Immediately that takes the pressure off you.

    Another approach is to restart at a specific time ( like 3:22AM ). Pick a time when you have the least usage. For this we use crontab to schedule at 3:22 each day:

    touch "/var/RestartPerlDaemons"
    and at 3:32 each day
    rm "/var/RestartPerlDaemons"
    During the 10 minutes we do some cleanup, but if you don't need to do that then just do the remove at 3:23. Obviously the Perl scripts have to check for the existence of the file and close down, and then not restart until the file is removed. Don't check on every cycle, but use time to check every 10 seconds of so ( saves on stats ).

    Memory leaks are among the most difficult problems to isolate. Others have given some good ideas to find the leaks, but you sound very frustrated by the situation.

    In a *nix forked environment, after some specified time, the children exit and the parent forks a new child. To give you some idea of the variables, on AIX the children exit after 8 hours, in some Linux systems it ranges from 2 hours to 12 hours. But to restart a clean child takes seconds.

    Perl depends on the system libraries, and if they have 'leaks', Perl is going to have leaks. Since you can't change your system, you need to minimize the problem for you.

    Good Luck!

    "Well done is better than well said." - Benjamin Franklin

      Along the lines of your recommendation ..

      Now that I've made some progress in reducing the severity of the memory growth I'm working on making a forking version of my scripts such that all of the prep work is done in the parent process (ie read configuration etc) and the real work is do in child processes and the parent processes will reap the children after a predetermined amount of time or based on memory usage.

        jamesrleu,

        Sounds like some of your frustration has been alleviated -- good!

        Here's some code you can put in the parent to test the expanding size of the children. You may want to verify that VSZ and RSS are the same for your system. Use 'man ps' and it should tell you the definitions.

        ... my ($mem1,$mem2) = &Display_Mem_Usage($child[$no],$NAME,0); if ( $mem1 > 0 ) { my $diff1 = $mem1 - $pmem1; my $diff2 = $mem2 - $pmem2; if ( $diff1 > $max_virtual ) { ... } # kill the child elsif ( $diff2 > $max_real ) { ... } # kill the child } ... sub Display_Mem_Usage { # VSZ is size in KBytes of the virtual memory ( VSZ * 1024 ) # RSS is size in pages of real memory ( 1024 * RSS ) my $cpid = shift; my $name = shift; my $from = shift; ## Not used here, but in some scr +ipts my $var = ""; my $fh; if ( ! ( kill 0 => $cpid ) ) ## Check that pid is active { return ( -1, -1 ); } my $arg = qq| -o "vsz rssize" -p $cpid|; ## make sure you specify the full path to 'ps' command open ( $fh, "-|", "/bin/ps $arg" ) or die "Prefork: Not open \'$ar +g\': $!"; while (<$fh>) { $var .= $_; } close $fh; my $rno = my @ref = split(/\n/,$var); if ( $rno < 2 ) { return ( -1, -1 ); } my $info = join(" ", split " ", $ref[1]); my ($vmem,$rmem) = ( split(/\ /,$info) ); return ( $vmem , $rmem ); }

        If you decide to use this code, only call the subroutine from the parent. In AIX it worked for both the parent and children, but in Linux it would hang after 4-5 hours. Must have some type of race condition, but you don't really need to call it from the children. To use it properly you call the sub after creating the child and save the returned sizes ($pmem1/2) in an array or hash. This way you can track the children and make sure they don't exceed your predetermined max sizes.

        For killing the children, I usually send 'ABRT' first, and then if the child still exists I send '-9' on the second pass. On the 3rd pass, if the child still exists, I email the system admin, and shutdown and restart the whole process. It has never happened so far, but you have to prepare for worst cases.

        Good Luck...Ed

        "Well done is better than well said." - Benjamin Franklin

Re: Memory management with long running scripts
by locked_user sundialsvc4 (Abbot) on Jul 23, 2012 at 12:50 UTC

    I wonder to what extent the matter could be addressed through some kind of a small redesign.   Is it possible for the scripts to, after some certain number of cycles, terminate themselves, knowing that somehow they will immediately be re-spawned and be able to resume their duties?   Yes, this is a pure-circumvention strategy but maybe at least in the short run this could be called most practical.

Re: Memory management with long running scripts
by Anonymous Monk on Jul 24, 2012 at 16:19 UTC
    Have a look at your callback subs. Do they make use of $self? Does $self keep reference to your event object? If so, you may want to:
    weaken($self) if !isweak($self).
    in your callback subs to get rid of circular refs.
      This suggestion is interesting, but only causes more questions :-) Most of my Events are created like such:
      package Stuff; use strict; use Event; sub new { my $class = shift; my $self = {}; $self->{args} = [@_]; bless $self, $class; $self->{event} = Event->timer( interval => 1, cb => [$self, "read"], ); return $self; } sub read { my $self = shift; my $e = shift; # do work that uses values from $self }
      I see that $self and the {event} are technically a circular reference, but since I only create one instance of Stuff, it should not be a source of ongoing leaks. Is there something else I'm missing under the covers of Event?