mr_p has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I am in need of more help from everyone here.
I am trying to parse a nested tag from HTML using HTML::Parser and I am having problems. Below is my code. Please let me know what I am doing wrong.
#!/usr/bin/perl use HTML::Parser; my $content=<<EOF; <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Some title goes here</title> </head> <body bgcolor="#FFFFFF"> <div id="leftcol"> menu column </div> <div id="body"> <div class="content"> <li>This is Line 1 </li> </div> <p>This is Line 2</p> </div> <div id="rightcol"> news column </div> </body> </html> EOF my $p = HTML::Parser->new( api_version => 3 ); $p->handler( start => \&start_handler, "self,tagname,attr" ); $p->parse($content); sub start_handler { my $self = shift; my $tagname = shift; my $attr = shift; my $text = shift; return unless ( $tagname eq 'div' and $attr->{id} eq 'body' ); $self->handler( start => sub { print shift }, "text" ); $self->handler( text => sub { print shift }, "text" ); $self->handler(end => sub { my ($endtagname, $self, $text) = @_; if($endtagname eq $tagname) { $self->eof; } else { print $text; } }, "tagname,self,text"); }
The output should be
<div id="body"> <div class="content"> <li>This is Line 1 </li> </div> <p>This is Line 2</p> </div> <div id="rightcol"> news column </div>
But the line it cuts off before 'This is Line 2'
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Removing nested div Tag from HTML
by metaperl (Curate) on Aug 18, 2011 at 20:06 UTC | |
by mr_p (Scribe) on Aug 18, 2011 at 20:13 UTC | |
by Anonymous Monk on Aug 18, 2011 at 20:55 UTC | |
by mr_p (Scribe) on Aug 18, 2011 at 21:38 UTC | |
by Anonymous Monk on Aug 19, 2011 at 12:42 UTC | |
by mr_p (Scribe) on Aug 19, 2011 at 15:02 UTC | |
by Anonymous Monk on Aug 19, 2011 at 15:15 UTC | |
by mr_p (Scribe) on Aug 19, 2011 at 15:18 UTC | |
by mr_p (Scribe) on Aug 19, 2011 at 21:27 UTC | |
by Anonymous Monk on Aug 20, 2011 at 03:10 UTC |