Re: Memory Leak HTML::FormatText

Replies are listed 'Best First'.
Re^2: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 16, 2013 at 07:51 UTC
Thank you for your reply. The first part of the code is at the highest level of the program, so @input is used throughout the while loop, which basically is the whole program. The first part just opens a file with website-addresses which get loaded into @input and then the loop cycles through them. In any case, I have now changed the code to the following: `use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; open INPUT, "< D:/websitelocations.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # followed by regular expressions, the results of which are saved pri +nted into a new file, all of this is currently disabled }` [download] The memory leak is still there though. It runs out of memory after about 3000 files, but I have more than 28 000. I still think it has something to do with HTML::FormatText. I read elsewhere that this calls HTML::Treebuilder which in the past has caused memory leaks when the object was not explicitly deleted. I added now `use HTML::TreeBuilder 5 - weak;` which should take care of it according to CPAN documentation on HTML::TreeBuilder. However, apparently it does not. I also tried to add explicit calls to the delete function: `$content->delete(new)` As well as `$input->delete(new)` But this just gives me an error message: can't locate object message	[reply] [d/l] [select]
Re^3: Memory Leak HTML::FormatText by Anonymous Monk on Sep 16, 2013 at 08:14 UTC
What is `$content->delete(new)` supposed to be or do (what is the string "new") ? Nevermind here is my tip, do a Data::Dumper of an object afer one or 10 files, and look for references Note especially the bless'ed package names Then go write some destructors, its what I did for bugs in HTML::TableExtract/HTML::TableExtract Memory Usage Since you're using sub format_file { I'd copy/paste its source and Dumper the objects involved to find circular-references `$VAR1 = { ... \$VAR1 };`	[reply] [d/l]
Re^4: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 16, 2013 at 09:23 UTC
The following code (adding the delete command in the eval call) works (no error message: "can't call method..."), but there is still a memory leak. use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; use constant HAS_LEAKTRACE => eval{ require Test::LeakTrace }; use Test::More HAS_LEAKTRACE ? (tests => 1) : (skip_all => 'require Te +st::LeakTrace'); use Test::LeakTrace; leaks_cmp_ok{ open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; # The file contains the adresses of 28 000 websites my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); #my $proposal; chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); eval { $content->delete; }; # followed by regular expressions, the results of which are saved in +a different file - all now disabled } } '<', 1; [download]	[reply] [d/l]
Re^4: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 16, 2013 at 09:17 UTC
Sorry, my mistake, this was supposed to read `$content->delete();` The hope was that this would destroy the Treebuilder object, which apparently gets build via HTML::FormatText, and thus prevent the memory leak. There is a reference to a delete function in the CPAN documentation, but I just get the error message "Can't call method...". The same goes for this line, which I just tried now after reading your earlier entry: `$content->eof;`	[reply] [d/l] [select]
Re^5: Memory Leak HTML::FormatText by Anonymous Monk on Sep 16, 2013 at 10:14 UTC
Re^4: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 18, 2013 at 12:25 UTC
In my reading of the CPAN documentation of Data::Dumper (not pretending that I understood most of it...) I would need to know the variable names that I am tracking first. But I guess my problem is exactly that I do not know them. It seems that HTML::TreeBuilder is creating something in the background. I do not think that this is what you were suggesting, but I tried the following code: `use warnings; use strict; use diagnostics; use HTML::FormatText; use Data::Dumper; open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $inputfile=shift(@INPUT); chomp $inputfile; my $content=HTML::FormatText->format_from_file($inputfile); print Dumper($_, $´); }` [download] The output is $var1=undef; and $var2=undef; - no problem there I guess...	[reply] [d/l]
Re^5: Memory Leak HTML::FormatText by Anonymous Monk on Sep 18, 2013 at 13:27 UTC
Re^6: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 18, 2013 at 13:34 UTC
Some notes below your chosen depth have not been shown here
Re^2: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 16, 2013 at 09:07 UTC
Thank you for pointing out Test::LeakTrace. I have tried this now, using the code from the CPAN example, but I do not receive any report. use warnings; use strict; use diagnostics; use HTML::FormatText; use HTML::TreeBuilder 5 - weak; use constant HAS_LEAKTRACE => eval{ require Test::LeakTrace }; use Test::More HAS_LEAKTRACE ? (tests => 1) : (skip_all => 'require Te +st::LeakTrace'); use Test::LeakTrace; leaks_cmp_ok{ open INPUT, "< D:/websiteadresses.txt" or die "Problem: $!"; #The file contains 28000 addresses of websites (each one on a new line +) my @INPUT=<INPUT>; close INPUT; while (@INPUT) { my $input=shift(@INPUT); chomp $input; print $input; my $content=HTML::FormatText->format_file($input, leftmargin => 0, ri +ghtmargin => 50); # This is followed by regular expressions, the results of which are s +aved in a new file; all of this is disabled now. } } '<', 1; [download] The program just runs out of memory after about 3000 runs through the while-loop, but the program output is not followed by any report from LeakTrace. I also did not see a reference to a file in which the report is saved, etc. on the CPAN documentation for Leaktrace. Not sure what to do now... I am running this in ActiveState's Komodo, but there is not output from Leaktrace either in the internal output window in Komodo nor in the external shell	[reply] [d/l]
Re^3: Memory Leak HTML::FormatText by PerlNovice999 (Novice) on Sep 16, 2013 at 09:45 UTC
LeakTrace: I now found the following comment in the overall output (just not at the very end, where I expected it): "Looks like your test exited with 1 before it could output anything"	[reply]