Any time I see regexes and html in the same question I get a queasy feeling born of unpleasant experience. So instead of answering your question, and since ovid doesn't seem to be here just now, how about (untested):
use HTML::TokeParser::Simple; my $page = ...; my $alternative = find_alternate_page( $page ); sub find_alternate_page { my $page = shift; return undef unless $page; my $p = HTML::TokeParser::Simple->new( \$page ); my $looking = 0; while ( my $token = $p->get_token ) { $looking = 1 if $token->is_start_tag( 'noframes' ); return undef if $token->is_end_tag( 'noframes' ); if ( $looking && $token->is_start_tag( 'a' ) ) { return $token->return_attr->{href}; } } }
Decisions about which urls interest you and which don't are easy to make once the address is retrieved, and probably best done separately from the retrieval itself, since you'll be wanting to change that policy at some point.
update. tested after all. seems to work.
In reply to Re: Question Regarding Regular Expressions and Negated Character Classes
by thpfft
in thread Question Regarding Regular Expressions and Negated Character Classes
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |