Should I just use HTML::Parser and shut up?
Here is a filter example to get you going - its really quite easy once you get you head around how it works. I find the pod a little obscure but there are some good tutorials out there.
You should easily see how we check each opening and closing tag and add it if it is on the ok list - parser calls &start for opening tags and &end for closing tags. Similarly we add the text between the OK opening and closing tags as parser calls &text and we have flagged that we do or don't want this text. If you just want the text just don't add the tags. What could be easier?
#!/usr/bin/perl -w package Filter; use strict; use base 'HTML::Parser'; my ($filter, $want_it); my @ok_tags = qw ( h1 h2 h3 h4 p br ); my %ok_tags; $ok_tags{$_}++ for @ok_tags; sub start { my ($self, $tag, $attr, $attrseq, $origtext) = @_; if ( exists $ok_tags{$tag}) { $filter .= $origtext; $want_it = 1; } else { $want_it = 0; } } sub text { my ($self, $text) = @_; $filter .= $text if $want_it; } sub comment { # uncomment to no strip comments # my ($self, $comment) = @_; # $filter .= "<!-- $comment -->"; } sub end { my ($self, $tag, $origtext) = @_; $filter .= $origtext if exists $ok_tags{$tag}; } my $parser = new Filter; my $html = join '', <DATA>; $parser->parse($html); $parser->eof; print $html; print "\n\n------------------------\n\n"; print $filter; __DATA__ <html> <head> <title>Title</title> </head> <body> <h1>Hello Parser</h1> <p>You need HTML::Parser</p> <h2>Parser rocks!</h2> <a href="html.parser.com">html.parser.com</a> <hr> <pre> use HTML::Parser; </pre> <!-- HTML PARSER ROCKS! --> </body> </html>
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
In reply to Re: Tag filtering: a standard mechanism?
by tachyon
in thread Tag filtering: a standard mechanism?
by thpfft
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |