Re: Unnesting deeply nested HTML elements (Deep recursion on subroutine "HTML::Element::delete")
by hippo (Archbishop) on Sep 19, 2022 at 17:00 UTC
|
I get the following error
It isn't an error, it's a warning. You can disable it if you like with no warnings 'recursion'; but be sure to comment why you are doing that and do it in the smallest lexical scope you can manage.
| [reply] [d/l] |
|
|
Thanks. I've tried adding the no warnings 'recursion'; to both the example script above and to the real script (there in the smallest lexical scope available). It does not suppress the warnings in either case.
I wonder if there would be a way to simply collect and not print the warnings, perhaps with an eval. However, the attempt below still prints the same warnings as the original example script above.
#!/usr/bin/perl
use HTML::TreeBuilder::XPath -weak;
use strict;
use warnings;
my $ent = HTML::TreeBuilder::XPath->new;
$ent->parse_file(\*DATA);
eval {
no warnings 'recursion';
$ent->delete;
};
if ($@) {
print "FOO\n";
}
exit(0);
__DATA__
<html>
<head>
<title>foo bar</title>
</head>
<body>
foo
<br />
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<center>
<strong>bar</strong>
<br />
<center>(baz)</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</center>
</body>
</html>
| [reply] [d/l] [select] |
|
|
Yeah, warnings are lexically scoped, so turning them off in one place only suppresses them if that's where they are generated.
In this case you need to get a bit more invasive: catch all warnings for the duration of the call, and rethrow all but the one you want to avoid:
{
local $SIG{__WARN__} = sub {
warn @_ unless $_[0] =~ /^Deep recursion/;
};
$ent->delete;
}
Note that it is safe to warn inside the warnings handler - the handler is suppressed while it is being called. | [reply] [d/l] |
Re: Unnesting deeply nested HTML elements (Deep recursion on subroutine "HTML::Element::delete")
by GrandFather (Saint) on Sep 19, 2022 at 21:17 UTC
|
| [reply] |
|
|
#!/usr/bin/perl -d:Confess
In that way I can at least see which line in my script the problem comes from. That in turn helps figure out which data is at fault.
| [reply] [d/l] [select] |
Re: Unnesting deeply nested HTML elements (Deep recursion on subroutine "HTML::Element::delete")
by GrandFather (Saint) on Sep 20, 2022 at 03:38 UTC
|
I thought I'd have a play with your issue so I downloaded your code and ran it. No warnings! So I wrote the following:
use strict;
use warnings;
print "$^V\n";
recurse(20000);
exit;
sub recurse {
my ($count) = @_;
recurse($count - 1) if $count;
}
which prints
v5.32.1
No warnings! My guess is that the warning dropped out of Perl at some point, but I can't find anything on perldelta to indicate its been "fixed". Maybe Strawberry Perl has its recursion limit warning set to some really large value? I do get an "Out of memory!" error if I set the recursion limit to 700,000.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] [d/l] [select] |
|
|
v5.32.0
Deep recursion on subroutine "main::recurse" at testfile line 11.
v5.34.0
Deep recursion on subroutine "main::recurse" at testfile line 11.
v5.36.0
Deep recursion on subroutine "main::recurse" at testfile line 11.
I don't have 5.32.1 handy, but I'd be astonished if this had been broken. | [reply] [d/l] |
|
|
I can't think of anything such as an environment variable or init script that may be coming into play. I'm running this on a fairly clean 64 bit Strawberry Perl without any environment tweaks explicitly made by me. I'll try again on my home machine.
Update Same result at home. Same Perl version, but Windows 11 rather than Windows 10.
Optimising for fewest key strokes only makes sense transmitting to Pluto or beyond
| [reply] |
|
|
|
|
Re: Unnesting deeply nested HTML elements (Deep recursion on subroutine "HTML::Element::delete")
by kcott (Archbishop) on Sep 20, 2022 at 07:57 UTC
|
G'day mldvx4,
Look at perldiag: Deep recursion on subroutine "%s".
Notice the value of 100 there.
You have 102 levels of <center>...</center> nesting.
Try reducing that to below 100 and see if there's any difference.
Keep reducing until the warning stops:
the recursion:nesting ratio may not be 1:1.
You also have a couple of other levels with <html>...</html> and <body>...</body>.
If you really do need what you've shown,
you'll have to recompile Perl with PERL_SUB_DEPTH_WARN set to an appropriate number.
You might want to look at Perlbrew
to compile a separate installation for this task.
See also: "Deep Recursion Limit".
| [reply] [d/l] [select] |
Re: Unnesting deeply nested HTML elements (Deep recursion on subroutine "HTML::Element::delete")
by Anonymous Monk on Sep 19, 2022 at 21:41 UTC
|
calling the delete method?
Perhaps you don't need to? Based on line numbers in diagnostic messages, you have modern HTML::Element, which made calling delete superfluous. Maybe just check that's the case with "use HTML::TreeBuilder::XPath -weak;" once, to be sure, but new behaviour is default w/o explicit import.
| [reply] [d/l] [select] |
|
|
Thanks. I see the same warnings whether I have "use HTML::TreeBuilder::XPath -weak; or plain old "use HTML::TreeBuilder::XPath; there at the beginning. What would have been the expected difference? Or, in other words, how would I know for sure whether I can omit the deletion?
| [reply] [d/l] [select] |
|
|
I think anon is telling you that you do not need to use delete at all with "modern" HTML::Element. See delete. OTOH even if you don't explicitly use delete perhaps HTML::Element will (edit: see Edit2 below), when you undef an element. And thus you will still get the warnings.
Generally, having "deep recursion" warnings is not harmful at all because you may well have a structure which is more than 100 deep. And that's fine (until your memory is exhausted). However, the real problem is whether WordPress managed to produced some HTML which parsing it causes cyclical paths somehow. Then you may get infinite recursion and that's real bad. I would investigate that before supressing the warnings.
bw, bliako
Edit: by delete I mean HTML::Element::delete()
Edit2: with weak references ON, as anon mentioned, it's the Perl interpreter/garbage collector who does the cleaning up as soon as the parent object goes out of scope or set to undef. I am trying to not give the impression that delete will be called internally with the "modern" regime.
| [reply] [d/l] [select] |