gator has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am using Mechanize module to automate log-in to a web page. The code that I have written is
use WWW::Mechanize; use Log::Log4perl qw(:easy); Log::Log4perl->easy_init($ERROR); # The starting point URL my $start_url = "http://***.**.**.*:****"; my $ret_val; my $redir_pattern = 'HTTP-EQUIV="Refresh"\s*CONTENT="0;URL='; my @temp_array; my $redir_url; # Create a new instance of WWW::Mechanize my $agent = WWW::Mechanize->new(); # Retrieve the page $agent->get($start_url); my $content = $agent->content(); print $content; my @links = $agent->find_all_links(); print $#links;
The output that I am getting is,
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Inventum - Service Selection Gateway</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1 +"> <META HTTP-EQUIV="PRAGMA" CONTENT="NO-CACHE"/> <META HTTP-EQUIV="EXPIRES" CONTENT="-1"/> <META HTTP-EQUIV="Refresh" CONTENT="0;URL=http://***.**.**.*:****/ssgu +sercgi/login.ssg?ip=***.**.**.*:****&mac=**:**:**:**:**:**&requestip= +***.**.**.*&requesturi=http%**%**%*****.**.**.*%******%**"> </head> <body> <p align="center">Please wait...<p> </body> </html> -1
Please note that the HTML contains <META HTTP-EQUIV="Refresh" which find_all_links is not recognizing, as print $#links; is returning -1. Am I doing anything wrong there? Please help... regards Gator

Replies are listed 'Best First'.
Re: Mechanize not recognizing link in <META HTTP-EQUIV="Refresh"
by Corion (Patriarch) on Nov 01, 2007 at 08:59 UTC

    Your problem is that

    <META http-equiv="...

    is not a link tag and hence WWW::Mechanize does not report it. Especially as the content attribute does not contain an URL but the timeout and an URL. I suggest you extract the target URL manually via a regular expression if the website you're parsing is fairly static, or using HTML::TokeParser or HTML::HeadParser if you have to deal with many types of pages.

    my $c = $agent->content; # my $target if ($c =~ m!<meta\s+http-equiv=['"]refresh["']\s+content="\d+;([^"]+)" +!si) { warn "Refresh found (-> $1), following"; my $target = $1; $c->get($target); };

    A quick Google search for www mechanize refresh turns up WWW::Mechanize follow meta refreshes...

Re: Mechanize not recognizing link in <META HTTP-EQUIV="Refresh"
by ForgotPasswordAgain (Vicar) on Nov 01, 2007 at 09:02 UTC

    Try looking at the module's source code. I quickly found _link_from_token, which contains this:

    if ( $tag eq 'meta' ) { my $equiv = $attrs->{'http-equiv'}; my $content = $attrs->{'content'}; return unless $equiv && (lc $equiv eq 'refresh') && defined $c +ontent; if ( $content =~ /^\d+\s*;\s*url\s*=\s*(\S+)/i ) { $url = $1; $url =~ s/^"(.+)"$/$1/ or $url =~ s/^'(.+)'$/$1/; } else { undef $url; } } # meta

    Does your URL match that regex? Maybe there is a bug.

      of course that will match, he probably has a really old version of mechanize