spx2 has asked for the wisdom of the Perl Monks concerning the following question:

I'm running a script that uses HTML::TreeBuilder and WWW::Mechanize and Mail::Send.

Its purpose is to scan some html and send some notifications to an email.

Now,because it's supposed to run uninterrupted and well over large periods of time without any problems there aren't supposed to be any memory leaks.

Inspite of this,after 1-2 days of running in background on a server it used all the memory on that server,and there was allot of memory on that server(12GB ram) and it also selfishly grabbed all the CPU cycles also(3 CPUs at 1ghz each , each dual-core).

So,this is very unusual for me,it never happened before,this problem happened with perl 5.10.0 I've been looking around some mailing lists and also perlmonks and I think someone else also mentioned that the garbage collection mechanism that perl uses may have some problems(I've found some evidence of people complaining they run scripts and when the scripts exit,they do not release the memory they acquired back to the operating system,that is not very good,altough this doesn't happen to me,the script just hogs memory over here and it releases when it exits).

I would like to add that I have tried to avoid any memory leaks myself or over-consumption by destroying the HTML::TreeBuilder object that I was using.

The only solution that I see know of would be to make a cron job that would close the script after 4-5 hours and restart it back again so that the memory will be realeased.

Questions: Why do you think this is happening ?

Does perl5/perl6 feature any possibility of forcing garbage collection after the programmer knows one variable is not needed any more ?

Best regards,

Stefan

Replies are listed 'Best First'.
Re: Gargantuan memory consumption issues
by Corion (Patriarch) on Dec 22, 2008 at 23:12 UTC

    You might be unaware that WWW::Mechanize keeps a history of visited pages. If your script never quits and only ever keeps one WWW::Mechanize object around, that will accumulate a vast history over time, which will also consume more and more memory.

    Of course, without seeing code, it's hard to tell.

      This is the likely culprit. Either set stack_depth() on the Mech object, or periodically destroy the object and create a new one.
        Thank you , this was indeed the problem
      +1 on that one. Mechanize is great, but a hog.
Re: Gargantuan memory consumption issues
by tilly (Archbishop) on Dec 22, 2008 at 23:00 UTC
    Perl uses reference counting rather than garbage collection. So memory is freed as soon as the last reference is removed. The drawback is that data that refers back to itself never gets freed. You need to use tools such as Devel::Leak to track down the memory leaks. Once you're properly freeing memory, your problem should go away.

      I would consider "reference counting" one of the possible algorithms of garbage collection. Some others being mark-and-sweep and copying and their generational extensions. To me "garbage collection" means I do not have to deallocate memory explicitly.

      I would actually like best a GC that'd count references, release immediately everything it can and run a mark&sweep or copy the still live stuff from time to time to clear up the forgotten self-referencing data. Especially since we are used to reference counting in Perl and mostly do make sure we do not forget to break cycles, but we tend to expect that objects are destroyed as soon as they stop being referenced.

        I would call reference counting to be one of the possible algorithms of memory management. But many different programming environments have settled on the phrase "true garbage collection" to refer to a memory management scheme that can catch circular cycles. As a result of that I don't like calling reference counting "garbage collection".

        On the question of which is better, I have been all over the map. I like reliable destruction mechanics - see my ReleaseAction for proof. But I've also been bitten by internal bugs from reference counting. I've had mod_perl systems whose memory consumption is higher than it should be in part because reference counting caused copy-on-write memory to be written (and therefore copied). I like using closure techniques that naturally lead to cycles and I've jumped hoops to avoid them.

        My current preference is that I'd like to see true garbage collection, or else reference counting, and I'd like a hybrid system least. Not the least because I'd be afraid that the reference counting would not get maintained properly and would slowly degrade, while continuing to require a performance overhead.

Re: Gargantuan memory consumption issues
by Tanktalus (Canon) on Dec 22, 2008 at 23:50 UTC

    Generally, unless it's supposed to run as a server (incoming requests can happen at any time) or start-up time is prohibitive, I prefer the cron-job solution myself anyway.

    For example, with the CB stats, I have one app that's constantly running to watch IRC for CB messages. But that's because connecting to IRC and getting into the #cbstream channel can take a while, and I'll lose information if I'm not already in when the message occurs. However, everything else runs as cron jobs: pre-parsing data in the database (every 6 minutes) and creating the stats page (every 60 minutes).

    The weak point really is that IRC application. I'm going to have to rewrite it at some point to us cb60 and/or the PM CB interface and gather its data via cron job, too, just for stability.

    I've found some evidence of people complaining they run scripts and when the scripts exit,they do not release the memory they acquired back to the operating system

    I didn't think that was possible, except maybe in DOS...

Re: Gargantuan memory consumption issues
by ysth (Canon) on Dec 22, 2008 at 23:39 UTC
Re: Gargantuan memory consumption issues
by bruno (Friar) on Dec 23, 2008 at 16:41 UTC
    +1

    Devel::Leak will point you towards the objects that are not beeing gc'd. Once you know which objects might be causing the leak, you can use Devel::Cycle to watch those objects, and it will tell you if they contain any circular references.