Re: Question Regarding Regular Expressions and Negated Character Classes

I don't think a single regular expression is the best way to solve this problem, since it either matches, or it doesn't. You can't print/log/whatever any information as of why there is no result. The shortest solution I would want to use still uses three regexen:

if ($html =~ m#<noframes>(.+?)</noframes>#is)
{
    @urls = grep { $_ !~ m#microsoft\.com|netscape\.com# } $& =~ m#<a 
+href="([^"]+)"#gi;
}
[download]

However, there are still issues with this: It assumes, that "href" directly follows "<a", which, by no means is neccessary. So a little longer, but better readable and more clear solution is the following:

if ($html =~ m#<noframes>(.+?)</noframes>#is)
{       
    $noframes = $1;
}
else
{       
    die "Couldn't find noframes tags";
}

while ($noframes =~ m#<a[^>]+>#gis)
{       
    my $link = $&;
    my ($url) = $link =~ m#href\s*=\s*"([^"]+)"#i;
        
    if ($url and $url !~ m#microsoft\.com|netscape\.com#i)
    {       
        push (@urls, $url);
    }
}
[download]

I didn't come up with a regex-only solution, mainly to the lack of time, but the only reason I would write a solution for this problem entirely as regex was to develop regex-skills ;) Hope this helps :)

Regards,
-octo

Comment on Re: Question Regarding Regular Expressions and Negated Character Classes Select or Download Code