Description: |
Makes HTML::TokeParser return a list when get_tag and get_token are called in list context. Other than that, identical to using a while to iterate over. It's to enable me to say:
my @links = map { $_->[1]{href} } $parser->get_tag('a')
And expect it to work. Sating the addiction of map-junkies. :)
This code would possibly be better applied to HTML::PullParser, but if I applied it to that I'd have to reimplement get_tag and do some other stuff which I don't want to. I think, anyway. |
package HTML::TokeParser::Listerine;
use strict;
use warnings;
use base 'HTML::TokeParser';
sub get_tag {
my $self = shift;
if (wantarray) {
# build and return a list
my @tags;
while ( my $tag = $self->SUPER::get_tag(@_) ) { # delegate to
+superclass
push @tags, $tag;
}
return @tags;
}
else { return $self->SUPER::get_tag(@_) }
}
sub get_token {
my $self = shift;
if (wantarray) {
# build and return a list
my @tokens;
while ( my $token = $self->SUPER::get_token(@_) )
{ # delegate to superclass
push @tokens, $token;
}
return @tokens;
}
else { return $self->SUPER::get_token(@_) }
}
1;
__END__
=pod
=head1 NAME
HTML::TokeParser::Listerine - Context-sensitive HTML token parsing
=head1 SYNOPSIS
use HTML::TokeParser::Listerine;
my $html = q {
<html>
<body>
<!-- Match my comment, and include it -->
<!-- in the output of get_token -->
<a href="http://www.foo.com">Bar</a><br />
<a href="http://www.bar.com">Foo</a><br />
</body>
</html>
};
my $p = HTML::TokeParser::Listerine->new(\$html);
# magically parse html with map rather than tedious while!
# you could also use get_token to do this
my @links = map { $_->[1]->{href} } $p->get_tag('a');
print "Links are: ", join("\n", @links), "\n";
=head1 DESCRIPTION
HTML::TokeParser::Listerine overrides the C<get_tag> and C<get_token>
+methods
of HTML::TokeParser to make them DWIM in a list context, for example o
+ne
provided by the C<grep> and C<map> operators. This allows you to do te
+rse
complex filtering, rather than having to enter a big while loop everyt
+ime you
want to parse HTML, which isn't easy on the eye.
Obviously, this is a slower approach than doing it with a while loop,
+as
internally it uses the same mechanism. It simply saves you typing, and
+ that can
be a lot more convenient than you think.
=head1 METHODS
The only difference to HTML::TokeParser is that if you use the methods
C<get_tag> and C<get_token> in list context they return a list of all
+the tags
and tokens, respectively. Using it in scalar context should behave the
+ same as
vanilla TokeParser does.
=head1 AUTHOR
Amoe.
=head1 REQUIREMENTS
HTML::TokeParser and everything else that depends on.
=head1 SEE ALSO
HTML::TokeParser and HTML::PullParser manpages.
=cut
|