in reply to Problem with parsing HTML with Regex's
It is not less code than doc's, but it is much more readable. You might want to throw in error checking for missing href and src attributes.use strict; use warnings; use HTML::TokeParser::Simple; use Data::Dumper; my $parser = HTML::TokeParser::Simple->new(\*DATA); my (@img,@link,@a); while (my $token = $parser->get_token) { if ($token->is_start_tag('img')) { push @img, $token->return_attr->{src}; } elsif ($token->is_start_tag('link')) { push @link, $token->return_attr->{href}; } elsif ($token->is_start_tag('a')) { push @a, $token->return_attr->{href}; } } print Dumper \@img,\@link,\@a; __DATA__ <A href=normal.link2 class="foo" > <img src="img.link2" alt="foo"> <a class=foo href='normal.link4'> <img height=20 width=25 src=img.link3 > <IMG src='img.link4'> <link href="css.link1"> <a class=foo href="normal.link1"> <img src="img.link1"> <a href="normal.link3"> <a Href='normal.link5'>
jeffa
L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Problem with parsing HTML with Regex's
by PodMaster (Abbot) on Nov 10, 2003 at 13:56 UTC |