tangent has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to extract the text from a nested list of HTML using HTML::TreeBuilder::XPath:
<ul class="top"> <li>Level 1 <ul> <li>Level 2 <ul> <li>Level 3</li> </ul> </li> </ul> </li> </ul>
I can't work out how to get the text of just the current node. For example, this:
my $tree = HTML::TreeBuilder::XPath->new; $tree->parse($html); my @nodes = $tree->findnodes('//ul[@class="top"]'); for my $node (@nodes) { my @elems = $node->findnodes('li'); for my $elem (@elems) { my $text = $elem->as_text; print "$text\n"; } }
gives me the following output, where all text values are concatenated:
Level 1 Level 2 Level 3
but what I want is:
Level 1
and then drill down to the next level to get that level's text

Replies are listed 'Best First'.
Re: Using Xpath to extract text value of nested list elements
by choroba (Cardinal) on Mar 20, 2014 at 17:09 UTC
    Just use the text() XPath pseudo-function which selects the text child:
    my $text = $elem->findvalue('text()');
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ
      Thanks choroba, that does the job.
      Where did you find this pseudo-function in the XPath module? I could not find it in the methods the documentation present and in the source neither or I might not have the enlightenment to see. Could you please bring me the light ?
        xpath is xpath, its part of the xpath specification, it is xpath :) there is 7 types of nodes, and one of them is text and text() selects text nodes :) http://www.w3.org/TR/xpath/