in reply to Tag filtering: a standard mechanism?
Should I just use HTML::Parser and shut up?
Here is a filter example to get you going - its really quite easy once you get you head around how it works. I find the pod a little obscure but there are some good tutorials out there.
You should easily see how we check each opening and closing tag and add it if it is on the ok list - parser calls &start for opening tags and &end for closing tags. Similarly we add the text between the OK opening and closing tags as parser calls &text and we have flagged that we do or don't want this text. If you just want the text just don't add the tags. What could be easier?
#!/usr/bin/perl -w package Filter; use strict; use base 'HTML::Parser'; my ($filter, $want_it); my @ok_tags = qw ( h1 h2 h3 h4 p br ); my %ok_tags; $ok_tags{$_}++ for @ok_tags; sub start { my ($self, $tag, $attr, $attrseq, $origtext) = @_; if ( exists $ok_tags{$tag}) { $filter .= $origtext; $want_it = 1; } else { $want_it = 0; } } sub text { my ($self, $text) = @_; $filter .= $text if $want_it; } sub comment { # uncomment to no strip comments # my ($self, $comment) = @_; # $filter .= "<!-- $comment -->"; } sub end { my ($self, $tag, $origtext) = @_; $filter .= $origtext if exists $ok_tags{$tag}; } my $parser = new Filter; my $html = join '', <DATA>; $parser->parse($html); $parser->eof; print $html; print "\n\n------------------------\n\n"; print $filter; __DATA__ <html> <head> <title>Title</title> </head> <body> <h1>Hello Parser</h1> <p>You need HTML::Parser</p> <h2>Parser rocks!</h2> <a href="html.parser.com">html.parser.com</a> <hr> <pre> use HTML::Parser; </pre> <!-- HTML PARSER ROCKS! --> </body> </html>
cheers
tachyon
s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Tag filtering: a standard mechanism?
by thpfft (Chaplain) on Sep 13, 2001 at 18:59 UTC |