in reply to Seemingly Valid HTML which crashes HTML::TreeBuilder::XPath

It's not a bug, but I'd say it's a bad design decision of HTML::Element to represent text nodes as strings instead of objects (which is what for example XML::LibXML does via XML::LibXML::Text). It can be somehow fixed by calling
$body->objectify_text;
before messing with its contents. See objectify_text.

map{substr$_->[0],$_->[1]||0,1}[\*||{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^*ARGV,3]

Replies are listed 'Best First'.
Re^2: Seemingly Valid HTML which crashes HTML::TreeBuilder::XPath
by mldvx4 (Hermit) on Nov 10, 2023 at 13:13 UTC

    The objectify_text call just seems to invert the problem. Though I can be rather obtuse and may not see the right way to use it.

    I might be able to fit XML::LibXML into the full script and replace HTML::TreeBuilder::XPath. Here is my sketch,

    #!/usr/bin/perl use XML::LibXML; use strict; use warnings; my $tree = XML::LibXML->load_xml(IO => \*DATA); my $dtd = XML::LibXML::XPathContext->new( $tree->documentElement() ); $dtd->registerNs( 'u' => 'http://www.w3.org/1999/xhtml' ); for my $body ($dtd->findnodes('//u:body')) { # print $body->toString; for my $n ($body->childNodes()) { print $n->toString; } } print "\n"; print "OK\n"; exit(0); __DATA__ <!DOCTYPE html> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta name="generator" content= "HTML Tidy for HTML5 for Linux version 5.6.0" /> <title></title> </head> <body> <p>foo</p> <p>bar</p> trololo </body> </html>

      Your code as provided runs fine for me:

      $ perl 11155543.pl <p>foo</p> <p>bar</p> trololo OK $

      If that isn't what you want/expect then you will need to show what you do expect also.


      🦛