tphyahoo has asked for the wisdom of the Perl Monks concerning the following question:

I want to get rid of white space between the <h1> and <h2> tags, and am trying to use HTML::Element->splice_content to do it. I guess I'm doing it wrong. Can someone tell me why?
use strict; use warnings; use HTML::TreeBuilder; my $html = '<h1>Blah</h1> &nbsp;<p>&nbsp;<br> <h2>Blah</h2>'; my $element_root = HTML::TreeBuilder->new_from_content($html); $element_root->dump; print "\n"; $element_root->splice_content(1,1); $element_root->dump;
Outputs:
<html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT) <body> @0.1 (IMPLICIT) <h1> @0.1.0 "Blah" " á" <p> @0.1.2 "á" <br> @0.1.2.1 <h2> @0.1.3 "Blah" <html> @0 (IMPLICIT) <head> @0.0 (IMPLICIT)
UPDATE: Thanks to everyone below. Practically every suggestion was useful. In the end, I realized that the main mistake I was making was doing the splice from the root node, rather than the body. I'm still not 100% satisfied, but at least now the splice works. The code I'm using now is:
use strict; use warnings; use HTML::Treebuilder; my $html = '<h1>Blah</h1> &nbsp;<p>&nbsp;<br> <h2>Blah</h2>'; my $element_root = HTML::TreeBuilder->new_from_content($html); my $body = $element_root->look_down( _tag => "body"); $body->dump; print "\n"; $body->splice_content(1,2); $body->dump;

Replies are listed 'Best First'.
Re: Help with HTML::Element->splice_content
by gu (Beadle) on Dec 02, 2005 at 19:00 UTC
Re: Help with HTML::Element->splice_content
by santonegro (Scribe) on Dec 02, 2005 at 21:16 UTC
    First, let's get clear on what you mean: do you want to get rid of white space or the non-breaking space entities?

    Regardless of which it is, the HTML::TreeBuilder objectify_text method will probably be of use.

Re: Help with HTML::Element->splice_content
by santonegro (Scribe) on Dec 02, 2005 at 23:36 UTC
    Ok, I have written a program that you can customize. Just set $targetto what you want to wipe out of text segments... for some reason the &nbsp; is not matching when I switch $target to it and I dont know why.

    But as you can see, the "hi" was removed... I will ask on the libwww-perl list about this.

    use strict; use warnings; use HTML::TreeBuilder; my $html = '<h1>Blah</h1>&nbsp;hithere&nbsp;<p>&nbsp;<br> <h2>Blah</h2>'; my $element_root = HTML::TreeBuilder->new_from_content($html); $element_root->objectify_text; $element_root->dump; print "\n"; my $target; $target = '&nbsp;'; $target = 'hi'; my @text = $element_root->look_down('_tag' => '~text'); for my $text_node (@text) { my $tmp = $text_node->attr('text'); warn $tmp; if ($tmp =~ m!$target!) { warn 'here'; $tmp =~ s!$target!!g; $text_node->attr(text => $tmp); } } print "\n"; $element_root->dump;
Re: Help with HTML::Element->splice_content
by santonegro (Scribe) on Dec 02, 2005 at 21:32 UTC
    Another thing: you can't be using TreeBuilder - you spelled the "b" with lowercase and it should be uppercase.

    At any rate, just break the tree into elements, and traverse or map through them and then ->detach() the ones that you want gone... example to follow.

      Lower case b works on windows but not on linux. In any case, thanks for pointing that out, because it needs to work on both.