mr_p has asked for the wisdom of the Perl Monks concerning the following question:
Hello Monks,
I am trying to parse html text with tags intact. The code I wrote using HTML::Parser strips out the tags.
Is there a way to keep tags intact?
below is my code.
#!/usr/bin/perl package MyParser; use base qw(HTML::Parser); my $main_content=""; sub start { my ($self, $tag, $attr, $attrseq, $origtext) = @_; if ($tag =~ /^span$/i && $attr->{'class'} =~ /^main-content$/i +) { # set if we find <span class="main-content" $content_flag = 1; } } sub text { my ($self, $text) = @_; # If we're in <H1>...</H1> or if ($content_flag) { $main_content .= $text; } } my $html = " <html> <head> <title>Blah</title> </head> <span class=\"main-content\"> <bold_text> Here's the body 1 </bold_text> <p> para1 </p> <p> para2 </p> </span> </html>"; my $parser = MyParser->new; $parser->parse("$html"); print "$main_content\n";
Output I get:
Here's the body 1 para1 para2
Output I need is:
<bold_text> Here's the body 1 </bold_text> <p> para1 </p> <p> para2 </p>
I would like the above output to still have tags, is it possible to do this with HTML::Parser?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Parsing HTML with tags intact
by roboticus (Chancellor) on Jan 06, 2011 at 23:52 UTC | |
|
Re: Parsing HTML with tags intact
by Anonyrnous Monk (Hermit) on Jan 06, 2011 at 23:51 UTC | |
|
Re: Parsing HTML with tags intact
by ww (Archbishop) on Jan 07, 2011 at 03:28 UTC | |
by mr_p (Scribe) on Jan 07, 2011 at 13:51 UTC | |
by ww (Archbishop) on Jan 07, 2011 at 14:17 UTC |