ajmcello has asked for the wisdom of the Perl Monks concerning the following question:

If I run the following code, it results in a memory leak. I monitor some internal websites, and after a few hours or days,the memory can become outrageous, sometimes running the machine out of the memory! I didn't put in a sleep or delay just so that you can see the problem, but when you run this code you'll see the memory usage just go insane. You can change the url to some bogus website without a properly formatted address, like $url="lksjfdlsjf" and you'll see the memory leak happen as well, and much quicker.

Is there a way to resolve this issue?

Thanks in advance!

#!/usr/bin/perl use DBI; use DBD::Pg; use POSIX; use HTML::Parse; use LWP::Simple; use URI::URL; while(1) { $url = "http://cnn.com"; $content = get $url; $content = parse_html($content)->format; print "$content\n"; }

Formatting fixed by Chady

Replies are listed 'Best First'.
Re: Serious memory leak when using LWP
by Eliya (Vicar) on Mar 08, 2012 at 04:12 UTC

    I think you're blaming the wrong module (LWP).  When I take HTML::Parse out of the game, I don't get any leakage.

    OTOH, when using just HTML::Parse, I see it leaking, too.

    use HTML::Parse; while(1) { parse_html(''); system "ps -p $$ -o rss,vsz"; sleep 1; }

    (Not that this makes much difference from an end user perspective... but you now at least know what to file a bug report against.)

    The usual workaround for long running programs with memory leaks is to fork a new worker process every n iterations.

      FWIW

      HTML::Parse returns a HTML::Tree object

      Its a known limitation of HTML::Tree, you have to delete() the tree to free the memory, otherwise it will leak a LOT of memory (circular references are circular, keep objects alive forever, unless they're weak)

      Rejected in 2005/06 Bug #12283 for HTML-Tree: Use weaken to avoid ->delete()?

      With delete there doesn't seem to be much of a leak

      perl -MHTML::Parse -le " system qq[pslist.exe -m $$ 2>NUL]; for(1..400 +00){ $ju = parse_html(q{ <p>yo</p> }); $junk = $ju->format; $ju->dele +te; } system qq[pslist.exe -m $$ 2>NUL]; "
        If you notice that any package provides an explicit destructor method ... call it what you will ... USE it. "There must be a reason." If it doesn't, still explicitly set references to "undef" instead of just letting them go out of scope. Is this advice necessary? I don't know.