in reply to Question Regarding Regular Expressions and Negated Character Classes
I don't think a single regular expression is the best way to solve this problem, since it either matches, or it doesn't. You can't print/log/whatever any information as of why there is no result. The shortest solution I would want to use still uses three regexen:
if ($html =~ m#<noframes>(.+?)</noframes>#is) { @urls = grep { $_ !~ m#microsoft\.com|netscape\.com# } $& =~ m#<a +href="([^"]+)"#gi; }
However, there are still issues with this: It assumes, that "href" directly follows "<a", which, by no means is neccessary. So a little longer, but better readable and more clear solution is the following:
if ($html =~ m#<noframes>(.+?)</noframes>#is) { $noframes = $1; } else { die "Couldn't find noframes tags"; } while ($noframes =~ m#<a[^>]+>#gis) { my $link = $&; my ($url) = $link =~ m#href\s*=\s*"([^"]+)"#i; if ($url and $url !~ m#microsoft\.com|netscape\.com#i) { push (@urls, $url); } }
I didn't come up with a regex-only solution, mainly to the lack of time, but the only reason I would write a solution for this problem entirely as regex was to develop regex-skills ;) Hope this helps :)
Regards,
-octo
|
|---|