Here's my go. It recursively builds a list of "ancestors" for each node using
$h->lineage. The
$h->objectify_text allows text nodes to have ancestors too. When it finds any matching text it prints out the list.
#!/usr/bin/perl
use warnings;
use strict;
use HTML::TreeBuilder;
my $h = HTML::TreeBuilder->new_from_content(
do{local $/;<DATA>},
);
$h->objectify_text;
my $text = q{text};
walk($h, $text);
sub walk{
my $h = shift;
my $text = shift;
for my $ele ($h->content_list) {
my @lineage = $ele->lineage;
my @ancestors;
for my $ancestor (reverse @lineage){
push @ancestors, $ancestor->tag;
}
if (
$ele->tag eq q{~text}
and
$ele->attr(q{text})
and
$ele->attr(q{text}) eq $text
)
{
printf(
qq{%s\t},
$_
) for @ancestors;
printf(
qq{found *%s* at depth %s\n},
$ele->attr(q{text}),
scalar @ancestors
);
}
walk($ele, $text);
}
}
__DATA__
<html><head><title>search</title></head>
<body>
<p>text</p>
<div>
<p>text</p>
</div>
</body></html>
html body p found *text* at depth 3
html body div p found *text* at depth 4
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.