Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to grab a webpage that requires a username and a password to login. I have both, but how do I program it in perl to recognize these and allow get() to go in there and get the html code for the webpage?

Chris H

  • Comment on Grabbing Webpages with Usernames and Passwords

Replies are listed 'Best First'.
Re: Grabbing Webpages with Usernames and Passwords
by valdez (Monsignor) on Feb 20, 2003 at 20:54 UTC

    It is very simple to get a protected page using LWP::UserAgent. A little example:

    use LWP::UserAgent; use HTTP::Request; use HTTP::Cookies; # setup your browser $ua = LWP::UserAgent->new(keep_alive => 1, timeout => 300); # what kind of browser you are $ua->agent("Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3a) Gecko/200 +21207 Phoenix/0.5"); # a place to store cookies, if needed $ua->cookie_jar( HTTP::Cookies->new(file => "perlcookies.dat", autosav +e => 1) ); # now build a request $request = HTTP::Request->new( GET => $url ); # set credentials $request->authorization_basic($username, $password); # run browser $response = $ua->request($request); # and check response if ($response->is_success) { print $response->content, "\n"; } else { die "failed: ", $response->message, "\n"; }

    Happy downloading :) Valerio

      I tried that code ... what does this error mean?

      Can't locate MIME/Base64.pm in @INC (@INC contains: /usr/lib/perl5/5.6 +.0/i386-linux /usr/lib/perl5/5.6.0 /usr/lib/perl5/site_perl/5.6.0/i38 +6-linux /usr/lib/perl5/site_perl/5.6.0 /usr/lib/perl5/site_perl .) at + /usr/lib/perl5/site_perl/5.6.0/HTTP/Headers.pm line 588.

        You must install module MIME::Base64. Download it from the previous link, unpack it and follow instructions contained in README. If you want a simpler way, just fire up a CPAN shell (perl -MCPAN -e shell) and type install MIME::Base64.

        Ciao, Valerio

        It means you need to install the MIME::Base64 module, base64 is used to encode the username and password when accessing web pages protected with http basic auth.

Re: Grabbing Webpages with Usernames and Passwords
by bassplayer (Monsignor) on Feb 20, 2003 at 21:01 UTC
    I would check the source of the login page and find out what the names are for input fields for the username and password (might be username and password) and the name of the CGI they are being submitted to (might be login.cgi).

    I would then piece the information together into a URL thusly:

    www.domain.com/cgi-bin/login.cgi?username=<username>&password=<password>

    You might need to add other hidden variables as well. I would then use LWP to grab the desired page. I believe LWP will allow you to use POST if necessary.

    bassplayer

Re: Grabbing Webpages with Usernames and Passwords
by mowgli (Friar) on Feb 21, 2003 at 09:16 UTC

    Assuming you want to do http basic authentication (the kind where your browser pops up a window asking you for your username and password, as opposed to authentication where you enter your username and password in a form on another page), you can use an URL like the following:

    http://username:password@some.site/path/to/file
    

    I hope this helps!

    --
    mowgli