Re: Is there a faster / more efficient / quicker or easier way to do this ?

This is the kind of job for which HTML::TreeBuilder was designed.

  use HTML::TreeBuilder;

  sub strip_a_elements
  {
    my( $html, $preserve_formatting ) = @_;
    my $t = HTML::TreeBuilder->new;
    if ( $preserve_formatting )
    {
      $t->no_space_compacting(1);
      $t->ignore_ignorable_whitespace(0);
      $t->store_comments(1);
      $t->store_declarations(1);
      $t->store_pis(1);
    }
    $t->parse( $html )->eof;
    # we've parsed; now do the desired transformation:
    $_->replace_with_content for $t->find_by_tag_name('a');
    # and return the resulting hunk of html:
    $t->as_HTML
  }
[download]

Update...

If you're sure you won't care about the formatting of the raw HTML, the above can be simplified to:

  sub strip_a_elements
  {
    my $t = HTML::TreeBuilder->new_from_content( $_[0] );
    $_->replace_with_content for $t->find_by_tag_name('a');
    $t->as_HTML
  }
[download]

Another idea I have is to make a new method in HTML::TreeBuilder (actually HTML::Element) for the purpose of removing elements like that.

  # remember that HTML::TreeBuilder inherits from HTML::Element.
  sub HTML::Element::strip_elements
  {
    my( $e, $tag ) = @_;
    $_->replace_with_content for $e->find_by_tag_name($tag);
    $e
  }

  # now we can write our subroutine like this:
  sub strip_a_elements
  {
    HTML::TreeBuilder
    ->new_from_content( $_[0] )
    ->strip_elements('a')
    ->as_HTML
  }

  # and call it:
  my $html_minus_links = strip_a_elements( $html );
[download]

jdporter
The 6th Rule of Perl Club is -- There is no Rule #6.

Comment on Re: Is there a faster / more efficient / quicker or easier way to do this ? Select or Download Code