I'm using the following bit of code to list out all the links in an HTML page, but I'd like it to ignore <a> tags that are immediately preceded by something like <!--ignore-->.
So <a href="http://perlmonks.org"> would get listed, but <!--ignore--><a href="http://perlmonks.org"> would not. Any suggestions?
#!/usr/bin/perl -w use strict; use HTML::TokeParser; -e $ARGV[0] or die "File does not exist: $ARGV[0]\n"; my $p = HTML::TokeParser->new( shift ); while ( my $token = $p->get_tag("a")) { my $url = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text("/a"); print "$url\n"; }
In reply to Skipping HTML tags with HTML::TokeParser by patgas
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |