$node->string_value();
Note this method is undocumented (there's a method with that name in XML::LibXML::NodeList, but your $nodes are XML::LibXML::Elements), you should use textContent instead.
Yes, that's true. I can't reconstruct exactly what happened when I made this code, I got into the documentation for an apparently unrelated module, where
string_value was documented. I'm tempted to erase the whole thing.
However, this code of yours:
my @texts = map { $_->data } node->findnodes('./text()');
actually shows *exactly* what I'm talking about: the "innermost_text" is ONLY appearing in the output for it's innermost containing element, which is the last
<div> element/node/whatever that you found with
$doc->findnodes('//*'). It's not in every element that it is inside of, like
<body> or
<html> That's what I was looking for! Thank you!!!
What I was working on: I've been doing some Python XHTML parsing, and over there, it was talking about "tail text". It's really weird - it says that text that follows an element's closing tag belongs to *that* element as "tail text" - NOT to the element that it is inside of. If you care, go to https://lxml.de/tutorial.html
and search on "document-style". Anyhow, I was testing in Perl to see if it had anything like that, which I don't see.
As far as using SAX parsers, I've used somewhat similar - HTML:: Parser or XML::Parser are similar, I think, you create callbacks for events that happen during parsing. Having discovered XPath, the event-driven parser now seems to me like a crude, primitive approach. I'm sure there are still places it applies.
--Bob Niederman,
All code given here is UNTESTED unless otherwise stated.