in reply to Stripping a-href tags from an HTML document
Update... If you're sure you won't care about the formatting of the raw HTML, the above can be simplified to:use HTML::TreeBuilder; sub strip_a_elements { my( $html, $preserve_formatting ) = @_; my $t = HTML::TreeBuilder->new; if ( $preserve_formatting ) { $t->no_space_compacting(1); $t->ignore_ignorable_whitespace(0); $t->store_comments(1); $t->store_declarations(1); $t->store_pis(1); } $t->parse( $html )->eof; # we've parsed; now do the desired transformation: $_->replace_with_content for $t->find_by_tag_name('a'); # and return the resulting hunk of html: $t->as_HTML }
Another idea I have is to make a new method in HTML::TreeBuilder (actually HTML::Element) for the purpose of removing elements like that.sub strip_a_elements { my $t = HTML::TreeBuilder->new_from_content( $_[0] ); $_->replace_with_content for $t->find_by_tag_name('a'); $t->as_HTML }
# remember that HTML::TreeBuilder inherits from HTML::Element. sub HTML::Element::strip_elements { my( $e, $tag ) = @_; $_->replace_with_content for $e->find_by_tag_name($tag); $e } # now we can write our subroutine like this: sub strip_a_elements { HTML::TreeBuilder ->new_from_content( $_[0] ) ->strip_elements('a') ->as_HTML } # and call it: my $html_minus_links = strip_a_elements( $html );
jdporter
The 6th Rule of Perl Club is -- There is no Rule #6.
|
|---|