Possible Memory Leak in HTML::TableExtract

michalgm has asked for the wisdom of the Perl Monks concerning the following question:

Hey all - I'm trying to determine if I'm doing something wrong, or if I've stumbled upon an actual bug. I'm trying to use HTML::TableExtract to parse a directory of html files, and I noticed that the memory footprint continues to grow as the process runs. I think I've nailed the problem down to TableExtract not properly destroying the TreeBuilder object or something. I know that the problem doesn't occur when I use TableExtract in its HTML::Parser mode. Here's my test code:

#!/usr/bin/perl 
use HTML::TableExtract qw(tree);
my $table = "<table>" . "<tr><td>1</td><td>2</td></tr>" x 100 . "</tab
+le>";
my $html = "<html><body>" . $table x 3 . "</body></html>";
foreach ( my $x = 0; $x <= 20; $x++) {
    my $p = HTML::TableExtract->new();
    $p->parse($html);   
    $p->eof;
    $p->delete;
    if (-f  "/proc/$$/statm") {
        my $mem = `cat /proc/$$/statm`;
        $mem =~ s/^(\d+).*/$1/s;
        print "$x: $mem\n";
    } 
}
[download]

Am I just doing something dumb, or should I go ahead and file a bug report? Thanks!

Comment on Possible Memory Leak in HTML::TableExtract Download Code

Replies are listed 'Best First'.
Re: Possible Memory Leak in HTML::TableExtract by roboticus (Chancellor) on Jan 28, 2009 at 18:47 UTC
michalgm: While I appreciate you not wanting to trash up the post with a bunch of html garbage, it does make it slightly more troublesome for someone to assist you, which means you might not get as many responses as you might want. I'd suggest amending your post to add something like the following line just after the `use` statement: `my $html = "<table>" . "<tr><td>1</td><td>2</td></tr>" x 100 . "</tabl +e>";` [download] ...roboticus	[reply] [d/l] [select]
Re: Possible Memory Leak in HTML::TableExtract by michalgm (Initiate) on Jan 28, 2009 at 21:07 UTC
Thanks roboticus - i didn't think of doing it that way. I've edited the post to contain some sample html.	[reply]