in reply to Re: Memory Leak HTML::FormatText
in thread Memory Leak HTML::FormatText

Thank you for your reply.

The first part of the code is at the highest level of the program, so @input is used throughout the while loop, which basically is the whole program. The first part just opens a file with website-addresses which get loaded into @input and then the loop cycles through them.

In any case, I have now changed the code to the following:

use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; open INPUT, "< D:/websitelocations.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # followed by regular expressions, the results of which are saved pri +nted into a new file, all of this is currently disabled }

The memory leak is still there though. It runs out of memory after about 3000 files, but I have more than 28 000.

I still think it has something to do with HTML::FormatText. I read elsewhere that this calls HTML::Treebuilder which in the past has caused memory leaks when the object was not explicitly deleted. I added now

use HTML::TreeBuilder 5 - weak;

which should take care of it according to CPAN documentation on HTML::TreeBuilder. However, apparently it does not. I also tried to add explicit calls to the delete function:

$content->delete(new)

As well as

$input->delete(new)

But this just gives me an error message: can't locate object message

Replies are listed 'Best First'.
Re^3: Memory Leak HTML::FormatText
by Anonymous Monk on Sep 16, 2013 at 08:14 UTC

    What is  $content->delete(new) supposed to be or do (what is the string "new") ?

    Nevermind

    here is my tip, do a Data::Dumper of an object afer one or 10 files, and look for references

    Note especially the bless'ed package names

    Then go write some destructors, its what I did for bugs in HTML::TableExtract/HTML::TableExtract Memory Usage

    Since you're using sub format_file { I'd copy/paste its source and Dumper the objects involved to find circular-references $VAR1 = { ... \$VAR1 };

      The following code (adding the delete command in the eval call) works (no error message: "can't call method..."), but there is still a memory leak.

      use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; use constant HAS_LEAKTRACE => eval{ require Test::LeakTrace }; use Test::More HAS_LEAKTRACE ? (tests => 1) : (skip_all => 'require Te +st::LeakTrace'); use Test::LeakTrace; leaks_cmp_ok{ open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; # The file contains the adresses of 28 000 websites my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); #my $proposal; chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); eval { $content->delete; }; # followed by regular expressions, the results of which are saved in +a different file - all now disabled } } '<', 1;

      Sorry, my mistake, this was supposed to read

      $content->delete();

      The hope was that this would destroy the Treebuilder object, which apparently gets build via HTML::FormatText, and thus prevent the memory leak. There is a reference to a delete function in the CPAN documentation, but I just get the error message "Can't call method...". The same goes for this line, which I just tried now after reading your earlier entry:

      $content->eof;

      In my reading of the CPAN documentation of Data::Dumper (not pretending that I understood most of it...) I would need to know the variable names that I am tracking first. But I guess my problem is exactly that I do not know them. It seems that HTML::TreeBuilder is creating something in the background.

      I do not think that this is what you were suggesting, but I tried the following code:

      use warnings; use strict; use diagnostics; use HTML::FormatText; use Data::Dumper; open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $inputfile=shift(@INPUT); chomp $inputfile; my $content=HTML::FormatText->format_from_file($inputfile); print Dumper($_, $´); }

      The output is $var1=undef; and $var2=undef; - no problem there I guess...

        Where do $_ and $` appear in your code before now?

        Try Dumper()ing $content