Using Xpath to extract text value of nested list elements

tangent has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to extract the text from a nested list of HTML using HTML::TreeBuilder::XPath:

<ul class="top">
    <li>Level 1
        <ul>
            <li>Level 2
                <ul>
                    <li>Level 3</li>
                </ul>
            </li>
        </ul>
    </li>
</ul>
[download]

I can't work out how to get the text of just the current node. For example, this:

my $tree = HTML::TreeBuilder::XPath->new;
$tree->parse($html);
my @nodes = $tree->findnodes('//ul[@class="top"]');
for my $node (@nodes) {
    my @elems = $node->findnodes('li');
    for my $elem (@elems) {
        my $text = $elem->as_text;
        print "$text\n";
    }
}
[download]

gives me the following output, where all text values are concatenated:

Level 1 Level 2 Level 3
[download]

but what I want is:

Level 1
[download]

and then drill down to the next level to get that level's text

Comment on Using Xpath to extract text value of nested list elements Select or Download Code

Replies are listed 'Best First'.
Re: Using Xpath to extract text value of nested list elements by choroba (Cardinal) on Mar 20, 2014 at 17:09 UTC
Just use the `text()` XPath pseudo-function which selects the text child: `my $text = $elem->findvalue('text()');` [download] لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply] [d/l] [select]
Re^2: Using Xpath to extract text value of nested list elements by tangent (Parson) on Mar 20, 2014 at 17:19 UTC
Thanks choroba, that does the job.	[reply]
Re^2: Using Xpath to extract text value of nested list elements by cord-bin (Friar) on Mar 21, 2014 at 08:27 UTC
Where did you find this pseudo-function in the XPath module? I could not find it in the methods the documentation present and in the source neither or I might not have the enlightenment to see. Could you please bring me the light ?	[reply]
Re^3: Using Xpath to extract text value of nested list elements by Anonymous Monk on Mar 21, 2014 at 08:36 UTC
xpath is xpath, its part of the xpath specification, it is xpath :) there is 7 types of nodes, and one of them is text and text() selects text nodes :) http://www.w3.org/TR/xpath/	[reply]