in reply to How to extract untouched content of html tag with HTML::Parser

I need all html formatting to be untouched

Maybe including start and end tags within the div would give you what you want.

sub start_handler { my $self = shift; my $tagname = shift; my $attr = shift; my $text = shift; return unless ( $tagname eq 'div' and $attr->{id} eq 'body' ); $self->handler( start => sub { print shift }, "text" ); $self->handler( text => sub { print shift }, "text" ); $self->handler(end => sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { $self->eof; } else { print $text; } }, "tagname,self,text"); }
  • Comment on Re: How to extract untouched content of html tag with HTML::Parser
  • Download Code

Replies are listed 'Best First'.
Re^2: How to extract untouched content of html tag with HTML::Parser
by Lana (Beadle) on Nov 28, 2010 at 17:35 UTC
    yeah!! thank you! it worked! I see my mistake :)
      FYI, shift not required, you can print @_
Re^2: How to extract untouched content of html tag with HTML::Parser
by SneakZa (Initiate) on May 28, 2013 at 16:34 UTC
    How do you save the output to a varible so it can be used later?

      That depends what you mean by later. Perhaps something like the following would work for you?

      sub start_handler { my $self = shift; my $tagname = shift; my $attr = shift; my $text = shift; my $variable = ''; return unless ( $tagname eq 'div' and $attr->{id} eq 'body' ); $self->handler( start => sub { $variable .= shift }, "text" ); $self->handler( text => sub { $variable .= shift }, "text" ); $self->handler(end => sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { later($variable); $self->eof; } else { $variable .= $text; } }, "tagname,self,text"); } sub later { my ($variable) = @_; ## do something with $variable }