Neat. That's close but takes the first part of the node, and only the first part of the node, not necessarily all of the pieces preceeding a <br /> element. Consider foo2a in the following example.
#!/usr/bin/perl use HTML::TreeBuilder::XPath; use strict; use warnings; my $root = HTML::TreeBuilder::XPath->new; $root->parse_file(\*DATA) or die("Could not parse the data: $!\n"); $root->eof(); my $xpath = '//div/p'; for my $d ($root->findnodes($xpath)) { my @line = $d->content_list; s/^\s+|\s+$//g for @line; $d->replace_with($line[0],qq(\n)); } print $root->as_trimmed_text,qq(\n); $root->delete; exit(0); __DATA__ <div><p>foo00 bar00</p></div> <div><p>foo01<br />bar01</p></div> <div> <p> foo02 <br /> bar02 </p> </div> <div> <p> <a href="foobar01">foobar02</a> foo02a <br /> bar02a </p> </div> <div> <p> foo03 <br /> bar03 <br /> baz03 </p> </div> <div> <p> <em>foo04</em> <br /> <strong>bar04</strong> <br /> <em>baz04</em> </p> </div> <div> <p> <em>foo05</em> </p> <p>bar05 <br /> <em>baz05</em> </p> </div>
I am trying many experiments with $d->content_list. I suppose it would be possible to extract the node as a hash and then loop through it eliminating the <br /> element and everything after it.
In reply to Re^4: Truncating an HTML node using XPaths in HTML::TreeBuilder::XPath
by mldvx4
in thread Truncating an HTML node using XPaths in HTML::TreeBuilder::XPath
by mldvx4
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |