Not the whitespace, but the newline character.

Newline is whitespace

But both as_text and as_trimmed_text cut this newline. ... Is it possible to preserve the newline?

No they don't. The whitespace is already gone before you call either of those methods. All you had to do was

$ perldoc HTML::TreeBuilder |grep -i space Do not represent the text content of elements. This saves spac +e if $root->ignore_ignorable_whitespace(value) whitespace text nodes in the tree. Default is true. (In fact, +I'd be $root->no_space_compacting(value) This determines whether TreeBuilder compacts all whitespace st +rings contiguous whitespace in the document is turned into a single +space. But that's not done if no_space_compacting is set to 1. Setting no_space_compacting to 1 might be useful if you want t +o read Redirects to HTML::Element:: delete_ignorable_whitespace $ perldoc HTML::Element |grep -i space $h->delete_ignorable_whitespace() whitespace. You should not use this if $h under a 'pre' element. "\t", or some number of spaces, if you specify it). whitespace is deleted, and any internal whitespace is collapsed. This will not remove hard spaces, unicode spaces, or any other non + ASCII white space unless you supplye the extra characters as a string Tabs are expanded to however many spaces it takes to get to the ne +xt 8th
#!/usr/bin/perl -- use strict; use warnings; use HTML::TreeBuilder; use Test::More qw' no_plan '; Main(@ARGV); exit(0); sub Main { is( OneT('<html><body></body></html>'), undef, 'no tag means undef not empty string' ); is( OneT('<html><title></title><body></body></html>'), '', 'no content' ); is( OneT('<html><title> </title><body></body></html>'), ' ', 'space' ); is( OneT(qq'<html><title>a\nb</title><body></body></html>'), "a\nb", 'a newline b' ); } ## end sub Main sub OneT { my ( $html, $expect, $name ) = @_; my $tree = HTML::TreeBuilder->new(); $tree ->no_space_compacting(1); $tree->parse($html); return eval { $tree->look_down(qw' _tag title')->as_text }; } ## end sub OneT __END__

In reply to Re^3: HTML::Element newline character by Anonymous Monk
in thread HTML::Element newline character by usr345

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.