patgas has asked for the wisdom of the Perl Monks concerning the following question:
I'm using the following bit of code to list out all the links in an HTML page, but I'd like it to ignore <a> tags that are immediately preceded by something like <!--ignore-->.
So <a href="http://perlmonks.org"> would get listed, but <!--ignore--><a href="http://perlmonks.org"> would not. Any suggestions?
#!/usr/bin/perl -w use strict; use HTML::TokeParser; -e $ARGV[0] or die "File does not exist: $ARGV[0]\n"; my $p = HTML::TokeParser->new( shift ); while ( my $token = $p->get_tag("a")) { my $url = $token->[1]{href} || "-"; my $text = $p->get_trimmed_text("/a"); print "$url\n"; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
(ichimunki) Re: Skipping HTML tags with HTML::TokeParser
by ichimunki (Priest) on Jul 31, 2001 at 22:02 UTC | |
by Anonymous Monk on Mar 14, 2002 at 10:04 UTC | |
by ichimunki (Priest) on Mar 14, 2002 at 19:14 UTC | |
|
Re: Skipping HTML tags with HTML::TokeParser
by voyager (Friar) on Jul 31, 2001 at 20:27 UTC | |
|
Re: Skipping HTML tags with HTML::TokeParser
by mexnix (Pilgrim) on Jul 31, 2001 at 20:32 UTC |