Thanks. That was it. The manual page for XML::XPath::Node::Element had most of what was needed.
I'm still puzzled about the data structures, however.
I'm getting XML::XPath::Node::Attribute=REF(0x55bcf3d63f30) instead of 'href' below. The manual page says it should return a list.
#!/usr/bin/perl
use HTML::Tidy;
use XML::XPath;
use strict;
use warnings;
my $body;
while(my $line = <DATA>) {
$body .= $line;
}
my $tidy = HTML::Tidy->new({output_xml => 1,numeric_entities=>1});
my $clean = $tidy->clean($body);
my $parser = XML::XPath->new(xml => $clean);
my $set = '//p/a';
my $nodes = $parser->find($set);
foreach my $node ($nodes->get_nodelist) {
print "\n";
print $node->getName(),"\n";
# this next line is wrong
print join(", ", $node->getAttributeNodes),"\n";
}
exit(0);
__DATA__
<!doctype html>
<html class="no-focus-outline no-js " lang="en-US"
data-modal-active="true">
<head>
<title>test</title>
</head>
<body>
<h1>test heading</h1>
<div>
<p>paragraph one
<a href="https://example.com/one/two.html">one</a> example.</p>
<p>paragraph two
<a href="https://example.com/two/three.html">another</a> example.</p>
</div>
</body>
</html>
|