in reply to Re^2: XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?
in thread XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?
Actually, I *do* believe that the text "World" belongs to the b element and, therefore, not to the p element.
Sure, that's up to you. I can't speak to how other modules implemented it, but I'd refer you to the libxml2 documentation, and the Document Object Model Specification for all the "official" details.
Anyway, I described two ways you can get the text nodes of the current node. Using the XPath expression I showed is probably easiest. I can't really say more since you haven't described what it is you're trying to do with the document.
use warnings; use strict; use XML::LibXML; my $doc = XML::LibXML->load_xml( string => <<'EOT' ); <html> <head> <title>Title_Text</title> </head> <body> <p>paragraph_text</p> <div> <div> innnermost_text </div> </div> </body> </html> EOT for my $node ($doc->findnodes('//*')) { print "<<<", $node->nodeName, ">>>\n"; my @texts = map { $_->data } $node->findnodes('./text()'); use Data::Dump; dd @texts; # Debug } __END__ <<<html>>> (" \n ", " \n ", " ") <<<head>>> (" ", " ") <<<title>>> "Title_Text" <<<body>>> (" \n ", "\n ", " \n ") <<<p>>> "paragraph_text" <<<div>>> (" \n ", "\n ") <<<div>>> " \n innnermost_text\n "
You could also use XML::LibXML::SAX to get an event-based parser.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: XML::LibXML::XPathContext->string_value - should ALL of the descendant's text be there?
by bobn (Chaplain) on Aug 10, 2020 at 06:01 UTC |