michalgm has asked for the wisdom of the Perl Monks concerning the following question:

Hey all - I'm trying to determine if I'm doing something wrong, or if I've stumbled upon an actual bug. I'm trying to use HTML::TableExtract to parse a directory of html files, and I noticed that the memory footprint continues to grow as the process runs. I think I've nailed the problem down to TableExtract not properly destroying the TreeBuilder object or something. I know that the problem doesn't occur when I use TableExtract in its HTML::Parser mode. Here's my test code:
#!/usr/bin/perl use HTML::TableExtract qw(tree); my $table = "<table>" . "<tr><td>1</td><td>2</td></tr>" x 100 . "</tab +le>"; my $html = "<html><body>" . $table x 3 . "</body></html>"; foreach ( my $x = 0; $x <= 20; $x++) { my $p = HTML::TableExtract->new(); $p->parse($html); $p->eof; $p->delete; if (-f "/proc/$$/statm") { my $mem = `cat /proc/$$/statm`; $mem =~ s/^(\d+).*/$1/s; print "$x: $mem\n"; } }
Am I just doing something dumb, or should I go ahead and file a bug report? Thanks!

Replies are listed 'Best First'.
Re: Possible Memory Leak in HTML::TableExtract
by roboticus (Chancellor) on Jan 28, 2009 at 18:47 UTC
    michalgm:

    While I appreciate you not wanting to trash up the post with a bunch of html garbage, it does make it slightly more troublesome for someone to assist you, which means you might not get as many responses as you might want. I'd suggest amending your post to add something like the following line just after the use statement:

    my $html = "<table>" . "<tr><td>1</td><td>2</td></tr>" x 100 . "</tabl +e>";
    ...roboticus
Re: Possible Memory Leak in HTML::TableExtract
by michalgm (Initiate) on Jan 28, 2009 at 21:07 UTC
    Thanks roboticus - i didn't think of doing it that way. I've edited the post to contain some sample html.