in reply to Memory Leak HTML::FormatText

"The problem seems to be with the call to HTML::Format. After the end of the loop there still seem to be variables containing text, etc. rather than it all getting properly scoped out (and hence the memory being returned)."

You're using package variables (INPUT and @INPUT). Their scope is the entire package (i.e. main) and they will persist until the script ends. See "perlmod - Perl modules (packages and symbol tables)".

What you probably want is lexical variables (see my). Try writing your code along these lines:

my $file = 'D:/htmladdresses.txt'; open my $input_fh, '<', $file or die "Can't open '$file' for reading: $!"; while (my $input = <$input_fh>) { chomp $input; # Use $input here as you were previously using it } close $input_fh;

See also: perlsub, perldata, our, local and open.

"I had a look at the nodes discussing memory leaks in general, but frankly did not find them that useful since they all mention tools such as Devel::Peek, Devel::Cycle... - but I cannot find a description of these tools that I as a newbie can understand."

You may find Test::LeakTrace is a little easier to use.

-- Ken

Replies are listed 'Best First'.
Re^2: Memory Leak HTML::FormatText
by PerlNovice999 (Novice) on Sep 16, 2013 at 07:51 UTC

    Thank you for your reply.

    The first part of the code is at the highest level of the program, so @input is used throughout the while loop, which basically is the whole program. The first part just opens a file with website-addresses which get loaded into @input and then the loop cycles through them.

    In any case, I have now changed the code to the following:

    use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; open INPUT, "< D:/websitelocations.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # followed by regular expressions, the results of which are saved pri +nted into a new file, all of this is currently disabled }

    The memory leak is still there though. It runs out of memory after about 3000 files, but I have more than 28 000.

    I still think it has something to do with HTML::FormatText. I read elsewhere that this calls HTML::Treebuilder which in the past has caused memory leaks when the object was not explicitly deleted. I added now

    use HTML::TreeBuilder 5 - weak;

    which should take care of it according to CPAN documentation on HTML::TreeBuilder. However, apparently it does not. I also tried to add explicit calls to the delete function:

    $content->delete(new)

    As well as

    $input->delete(new)

    But this just gives me an error message: can't locate object message

      What is  $content->delete(new) supposed to be or do (what is the string "new") ?

      Nevermind

      here is my tip, do a Data::Dumper of an object afer one or 10 files, and look for references

      Note especially the bless'ed package names

      Then go write some destructors, its what I did for bugs in HTML::TableExtract/HTML::TableExtract Memory Usage

      Since you're using sub format_file { I'd copy/paste its source and Dumper the objects involved to find circular-references $VAR1 = { ... \$VAR1 };

        The following code (adding the delete command in the eval call) works (no error message: "can't call method..."), but there is still a memory leak.

        use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; use constant HAS_LEAKTRACE => eval{ require Test::LeakTrace }; use Test::More HAS_LEAKTRACE ? (tests => 1) : (skip_all => 'require Te +st::LeakTrace'); use Test::LeakTrace; leaks_cmp_ok{ open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; # The file contains the adresses of 28 000 websites my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); #my $proposal; chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); eval { $content->delete; }; # followed by regular expressions, the results of which are saved in +a different file - all now disabled } } '<', 1;

        Sorry, my mistake, this was supposed to read

        $content->delete();

        The hope was that this would destroy the Treebuilder object, which apparently gets build via HTML::FormatText, and thus prevent the memory leak. There is a reference to a delete function in the CPAN documentation, but I just get the error message "Can't call method...". The same goes for this line, which I just tried now after reading your earlier entry:

        $content->eof;

        In my reading of the CPAN documentation of Data::Dumper (not pretending that I understood most of it...) I would need to know the variable names that I am tracking first. But I guess my problem is exactly that I do not know them. It seems that HTML::TreeBuilder is creating something in the background.

        I do not think that this is what you were suggesting, but I tried the following code:

        use warnings; use strict; use diagnostics; use HTML::FormatText; use Data::Dumper; open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $inputfile=shift(@INPUT); chomp $inputfile; my $content=HTML::FormatText->format_from_file($inputfile); print Dumper($_, $´); }

        The output is $var1=undef; and $var2=undef; - no problem there I guess...

Re^2: Memory Leak HTML::FormatText
by PerlNovice999 (Novice) on Sep 16, 2013 at 09:07 UTC

    Thank you for pointing out Test::LeakTrace. I have tried this now, using the code from the CPAN example, but I do not receive any report.

    use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; use constant HAS_LEAKTRACE => eval{ require Test::LeakTrace }; use Test::More HAS_LEAKTRACE ? (tests => 1) : (skip_all => 'require Te +st::LeakTrace'); use Test::LeakTrace; leaks_cmp_ok{ open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; #The file contains 28000 addresses of websites (each one on a new line +) my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # This is followed by regular expressions, the results of which are s +aved in a new file; all of this is disabled now. } } '<', 1;

    The program just runs out of memory after about 3000 runs through the while-loop, but the program output is not followed by any report from LeakTrace. I also did not see a reference to a file in which the report is saved, etc. on the CPAN documentation for Leaktrace. Not sure what to do now...

    I am running this in ActiveState's Komodo, but there is not output from Leaktrace either in the internal output window in Komodo nor in the external shell

      LeakTrace: I now found the following comment in the overall output (just not at the very end, where I expected it):

      "Looks like your test exited with 1 before it could output anything"