LE500 has asked for the wisdom of the Perl Monks concerning the following question:

I am currently trying to make a Perl program to use an HTTPS website. Now I can access the website fine. However, the second I try to submit any kind of form (even just click_button), then it leads to a 401 unauthorized question.

What could I be doing wrong? How come my credentials work fine when going to a page on the site, but the second I try any kind of form it leads to an error? Just to be clear I am a newbie at web-using programs so my coding might be sad. In this example, all I'm trying to do is click a "if you agree, click here" button to get past a disclaimer (get redirected here). Sorry for editing out the website, but it's not a public site.

Update - I tried just a simple follow_link, with the same unauthorized results. Just to be clear, my credentials are allowing me to go to a protected page (in a get(URL) statment). It's not letting me click on a link/form to go to a different page (even pages that work if I just use a get(URL) statement). I don't know if this will help, but without the MIME::Base64 use, I can't even 'jump' to a page. Still working on it.

#!usr/bin/perl use strict; use warnings; require LWP::UserAgent; require HTTP::Request::Common; require HTTP::Response; use HTML::Form; use WWW::Mechanize; use MIME::Base64; my $username = 'username'; my $password = 'password'; my @args = ( Authorization => "Basic " . MIME::Base64::encode( $username . ':' . $password )); my $mech = WWW::Mechanize->new(); # I can remove the realm/host without affecting the 200 -> 401 occuran +ce $mech->credentials( 'www.edited.com:443', 'Title', $username, $passw +ord ); # enabling cookies, I'm not sure if I even need this part use HTTP::Cookies; $mech->cookie_jar( HTTP::Cookies->new( 'file' => '~/Desktop/programming/cookies.lwp', # where to read/write cookies 'autosave' => 1, # save it to disk when done )); # the site my $url = "https://www.edited.com/disclaimer.cgi"; $mech -> get ($url, @args); # HTML prints normally print $mech->content; # then I just want to click the button B1 $mech->click_button(name => 'B1'); # just in case it was a delay issue sleep(5); # then I get a 401 error print $mech->status(), "\n";
Thank you.

Replies are listed 'Best First'.
Re: HTTPS WWW::Mechanize Form Problems
by zentara (Cardinal) on Jan 21, 2009 at 17:33 UTC
Re: HTTPS WWW::Mechanize Form Problems
by perrin (Chancellor) on Jan 21, 2009 at 18:35 UTC
    You can debug this by looking at the request Mech sends and checking to see that your expected credentials are in it. The debugger can be handy for this.
Re: HTTPS WWW::Mechanize Form Problems
by bart (Canon) on Jan 23, 2009 at 20:06 UTC
    The site you pointed us to in the CB has the credentials realm "Automated Splice Site Analyses". I don't see that anywhere in your code. Instead, your realm is simply "Title":
    $mech->credentials( 'www.edited.com:443', 'Title', $username, $passw +ord );
    Change the second parameter to "Automated Splice Site Analyses" and you might have more luck.

    FYI here's the relevant code in LWP::UserAgent:

    sub credentials { my $self = shift; my $netloc = lc(shift); my $realm = shift || ""; my $old = $self->{basic_authentication}{$netloc}{$realm}; if (@_) { $self->{basic_authentication}{$netloc}{$realm} = [@_]; } return unless $old; return @$old if wantarray; return join(":", @$old); }
    As you can see, if $realm has the wrong value, one that isn't in the HoHoA, nothing will be returned.

    update WWW::Mechanize overrides this method and allows a 2 argument form: just username and password. That way you can avoid this mess with the realm altogether. Thanks for the tip, erix, although maybe unintentional... :)

    $mech->credentials( $username, $password );
Re: HTTPS WWW::Mechanize Form Problems
by imrags (Monk) on Jan 22, 2009 at 07:49 UTC
    few things
    1. Check if there's javascript in the page. if that's the case, forget mechanize. it might not work
    2. if using IE, use win32::IEAutomation, might be easier to handle such pages
    3. Instead of sleep(5) check the function for Win32::IEAutomation to wait for complete loading of the page
    Check this Win32:: How to tell when Default browser url has loaded This will actually wait till the page loads if there is a lag..this works with IEautomation only..
    Raghu

      No Java to worry about. I'm using Linux so no IE is being used.

      Does anyone know any tips and tricks on how to install modules that just won't work? Is PerlMonks a good place to ask this question, or is this something I should bring up at CPAN?

        If you're getting a 401 error for a https URL, you can request https URLs. If you're getting a 500 error, for a https URL, you likely don't have Crypt::SSLeay or IO::SSLeay installed.

        If you're not on Windows, I recommend installing the prerequisite libraries (libeay or whatever it is called) via the package manager of your OS. Then install the Perl module.

Re: HTTPS WWW::Mechanize Form Problems
by whakka (Hermit) on Jan 21, 2009 at 19:13 UTC
      This is probably the situation. But I keep getting errors when I try to install OpenSSL and Net::SSLeay. Thanks for the help though. EDIT - they did not help, so strange.
Re: HTTPS WWW::Mechanize Form Problems
by Corion (Patriarch) on Jan 23, 2009 at 22:32 UTC

    The following code works for me:

    #!/usr/bin/perl use strict; use WWW::Mechanize; print "Versions\n"; for (qw(WWW::Mechanize LWP::UserAgent)) { print "$_\t" . $_->VERSION, "\n"; }; my $url = 'https://www.edited.com/'; my $user = 'user'; my $pass = 's3cr1t'; my $m = WWW::Mechanize->new(); $m->credentials($user => $pass); $m->get($url); print $m->follow_link(text => 'Enter'); print $m->title,"\n"; print $m->uri,"\n"; $m->click_button(name => 'B1'); print $m->uri,"\n"; print $m->status,"\n";

    It outputs for me:

    Versions WWW::Mechanize 1.52 LWP::UserAgent 5.814 HTTP::Response=HASH(0x11fea1c)Automated Splice Site Analyses https://www.edited.com/cgi-bin/protected/disclaimer.cgi https://www.edited.com/cgi-bin/protected/menu.cgi?menu_gene.html 200

    So I assume it has something to do with the versions of the module(s) you have installed.

      This post fixed my problem. The code is much better, but the tipping point was "Mechanize 1.52", I had 1.34. A simple upgrade through CPAN fixed my problem. I am so happy. Thank you Corion and everyone at the Chatterbox.
Re: HTTPS WWW::Mechanize Form Problems
by LE500 (Initiate) on Jan 23, 2009 at 19:49 UTC

    An update:

    I've installed all the suggested modules, tried multiple ways to get around this, and I still haven't gotten it to work. It just won't accept the authentication, and takes me to a "username/password wrong" page instead of the next page. To add to the confusion, with my authentication steps, I can go to any page on the website through get($url), but I can't use a form or go to a new page through a link. Once I try, even get($url) stops working. Here's a quick update/simpler version to my code (just looking for a link now instead of a form):

    #!usr/bin/perl use strict; use warnings; # left some comments to show what I've tried #require LWP::UserAgent; #require HTTP::Request::Common; #require HTTP::Response; #use Crypt::SSLeay; #use IO::Socket::SSL; #use CGI::Form; #use HTML::Form; use WWW::Mechanize; use MIME::Base64; my $username = 'name'; my $password = 'password'; my $website = "https://www.url.com/"; my @args = ( Authorization => "Basic " . MIME::Base64::encode( $username . ':' . $password )); my $mech = WWW::Mechanize->new( ); $mech->credentials( 'www.url.come','Automated Analyses', $username, $p +assword ); # enabling cookies use HTTP::Cookies; $mech->cookie_jar(HTTP::Cookies->new()); my $url = "https://www.url.com/"; $mech -> get ($url); # getting links my @links = @{$mech->links}; # the 'enter' link on the front page $mech->follow_link(n=> 5); # and this prints the "401 error" page print "\n\n", $mech->content, "\n";