Mojo::DOM doesn't include marked-up text in an element's text

Cody Fendant has asked for the wisdom of the Perl Monks concerning the following question:

Here's a minimal example: say I have two paragraphs:

    <p>
        Paragraph one here.
    </p>

    <p>
        Paragraph 
            <b>two</b>
       here.
    </p>
[download]

And I use Mojo::DOM to grab their text:


use Mojo::DOM;

my $dom =
  Mojo::DOM->new('<p>Paragraph one here.</p><p>Paragraph <b>two</b> he
+re.');
for my $e ( $dom->find('p')->each ) {
    print $e->text,$/;
}

### Output: 
# Paragraph one here.
# Paragraph  here.
#
[download]

How do I access that paragraph's complete text, including the text inside that second level of markup? And is this a bug or a feature?

Comment on Mojo::DOM doesn't include marked-up text in an element's text Select or Download Code

Replies are listed 'Best First'.
Re: Mojo::DOM doesn't include marked-up text in an element's text by choroba (Cardinal) on Apr 22, 2020 at 22:51 UTC
It's a feature. XML::LibXML behaves similarly: #!/usr/bin/perl use strict; use warnings; use feature qw{ say }; use XML::LibXML; my $xml = '<r><p>Paragraph one here.</p><p>Paragraph <b>two</b> here.< +/p></r>'; my $dom = 'XML::LibXML'->load_xml(string => $xml); print $dom->findvalue('/r/p[2]'); # Same as $dom->findnodes('/r/p[2]/ +/text()') # Paragraph two here. print $dom->findnodes('/r/p[2]'); # Same as map $_->toString, $dom->f +indnodes('/r/p[2]') # <p>Paragraph <b>two</b> here.</p> print $dom->findnodes('/r/p[2]/text()'); # Paragraph here [download] What do you mean by "complete text"? `map{substr$_->[0],$_->[1]\|\|0,1}[\\|\|{},3],[[]],[ref qr-1,-,-1],[{}],[sub{}^ARGV,3]`	[reply] [d/l] [select]
Re^2: Mojo::DOM doesn't include marked-up text in an element's text by marto (Cardinal) on Apr 23, 2020 at 11:10 UTC
"What do you mean by "complete text"?" What anonymous said below, the method Cody Fendant should have used to get the combined text for all descending nodes is `all_text`, rather than `text`, so `print $e->text,$/;` [download] becomes `print $e->all_text,$/;` [download] Very handy, even for one liners/ojo use.	[reply] [d/l] [select]
Re: Mojo::DOM doesn't include marked-up text in an element's text (all_text) by Anonymous Monk on Apr 23, 2020 at 02:06 UTC
Always check assumptions against docs docs man docs Mojo::DOM `all_text my $text = $dom->all_text; Extract text content from all descendant nodes of this element. text my $text = $dom->text; Extract text content from this element only (not including child elements).` [download]	[reply] [d/l]
Re^2: Mojo::DOM doesn't include marked-up text in an element's text (all_text) by Cody Fendant (Hermit) on Apr 24, 2020 at 02:33 UTC
Thanks! And damn, I can't believe I didn't spot that in the documentation. To be fair, a simple "see `all_text`" in the documentation next to `text` would have saved me a lot of frustration!	[reply] [d/l] [select]
Re^3: Mojo::DOM doesn't include marked-up text in an element's text (all_text) by hippo (Archbishop) on Apr 24, 2020 at 08:55 UTC
That's just one of the reasons why I miss Annocpan so much.	[reply]
Re^4: Mojo::DOM doesn't include marked-up text in an element's text (all_text) by marto (Cardinal) on Apr 24, 2020 at 08:58 UTC
Re^5: Mojo::DOM doesn't include marked-up text in an element's text (annocpan) by hippo (Archbishop) on Apr 24, 2020 at 10:35 UTC
Some notes below your chosen depth have not been shown here
Re^3: Mojo::DOM doesn't include marked-up text in an element's text (all_text) by marto (Cardinal) on Jul 16, 2020 at 18:04 UTC
merged.	[reply]