reneeb has asked for the wisdom of the Perl Monks concerning the following question:

Hi!

I want to "split" a HTML-text via HTML::Parser.

I have the HTML-text
<p>This is a bad try to display text then code <pre>#! usr/bin/perl use strict; use warnings; print "Hello World!";</pre> and then plain text again</p>

And I want to get an array which has three elements:
array_0_: This is a bad try to display text then code
array_1_: #! usr/bin/perl
   use strict;
   use warnings;

   print "Hello World!";
array_2_: and then plain text again

I've tried to solve this problem this way:
#! /usr/bin/perl use strict; use warnings; use Data::Dumper; use HTML::Parser; my $pa = qq~ <p>This is a bad try to display text then code <pre>#! usr/bin/perl use strict; use warnings; print "Hello World!";</pre> and then plain text again</p>~; my $p = HTML::Parser->new(); $p->handler(start => \&start_handler,"tagname,self"); $p->parse($pa); sub start_handler{ my ($tag,$self) = @_; my $text = ''; if($tag eq 'pre'){ print "Pre:\n"; } $self->handler(text => sub {$text .= shift},"dtext"); $self->handler(end => sub {my $tag = shift; print $text,"\n\n" if($ +tag eq 'pre' || $tag eq 'p');},"tagname"); }

But that fails. I just get the code, then the code again with "and then plain text again"...

Replies are listed 'Best First'.
Re: split html via HTML::Parser
by saskaqueer (Friar) on Feb 28, 2005 at 12:11 UTC
    #!/usr/bin/perl -w use strict; use HTML::Parser; my $parser = HTML::Parser->new( start_h => [ \&_starttag, 'self, tagname, attr' ], end_h => [ \&_endtag, 'self, tagname' ], text_h => [ \&_text, 'self, dtext' ] ); my @chunks; $parser->parse_file(\*DATA); print "----------\n$_\n----------\n\n" for @chunks; sub _starttag { my ($self, $tag, $attr) = @_; $self->{'_pre'} = 1 if ($tag eq 'pre'); } sub _endtag { my ($self, $tag) = @_; $self->{'_pre'} = undef if ($tag eq 'pre'); } sub _text { my ($self, $dtext) = @_; $dtext =~ s/\A\s+//; $dtext =~ s/\s+\z//; return() unless ( length($dtext) > 0 and $dtext =~ /[^\s]/ ); if ( defined($self->{'_pre'}) ) { push(@chunks, "PRE: $dtext"); } else { push(@chunks, "TEXT: $dtext"); } } __END__ <p>This is a bad try to display text then code <pre>#! usr/bin/perl use strict; use warnings; print "Hello World!";</pre> and then plain text again</p>
      This works fine! Thanks!