Unicef2k has asked for the wisdom of the Perl Monks concerning the following question:

I usually go to a specific web site on a daily basis. On this web site, I have to login. That is enter my user name and password, and then click a login button. This leads me to another web page, that has a list of links, I'm interested in the second one. What packages/modules can I use to access the page I want at once, given my username and password? I know that I can use the LWP module to retrieve the HTML source for a page but that all I know about LWP.

Replies are listed 'Best First'.
Re: Accessing web pages
by lhoward (Vicar) on Jun 05, 2000 at 04:15 UTC
    Depends how the site does userids/password.

    If the login is done through a netscape-style userid/password popup then you can pass your userid & password by setting your credentials using the HTTP::Request authorization_basic method. The lwpcook documentation that comes with LWP has a good example.

    If the login is done with a userid/password form and cookies, you can use the HTTP::Cookies to set up a cookiejar and then simulate your login. You set up a cookie-jar for an LWP::UserAgent as follows:

    use HTTP::Cookies; my $agent = new LWP::UserAgent; my $co=new HTTP::Cookies(file=>'./stored.cookies',autosave=>1); $agent->cookie_jar($co);
    Then you would do one post to "simulate" a login and get the cookies set, then just retrieve the page using $agent and all the apropriate cookies set by the login steps should be passed. Depending on how the site was written, it may take some tweaking to get this working just right.

    There are other ways that sites can use to do user/password login authentication, but they are almost never user. Those two above cover about %99 of all sites that do some sort of login.

Re: Accessing web pages
by Asim (Hermit) on Jun 06, 2000 at 00:21 UTC
    Others have answered you well! A couple of extra, minor points:
    1) There is a perl script called HTTPsniffer.pl, which is near-perfect for what you're trying to do. As the name implies, it "sniffs" the HTTP sessions between your browser and the server, writing it to a log file. That allows you to see the exact info you're sneding and receiving, including cookies, URL strings, POST info, and the like. I've used it with great success in the recent past.
    It's by Tim Meadowcroft, and you can find it at http://www.compansr.demon.co.uk/
    2) If you end up with a SSL connection, for some perverse reason, you can get a copy of CURL-SSL, which is a command-line interface for retriving Web pages. With some fun piping, you can sling SSL data back and forth for hours... :) Get it at http://curl.haxx.nu/
    ----Asim, known to some as Woodrow.
      Just an update: HTTPSniffer.pl has moved here.
RE: Accessing web pages
by ttatum (Initiate) on Jun 05, 2000 at 17:56 UTC
    If the username and password are done via http authentification (the browser pops up a login/password box), then on the URL you can do: http://login:password@www.foo.bar/links and this will get you into the page... Thomas
Re: Accessing web pages
by Unicef2k (Initiate) on Jun 05, 2000 at 08:36 UTC
    What if there was a web page that had buttons or check boxes, how can I clicked the button or check a check box?
      If your page has a HTML form with a submit button and uses the POST method you can solve your problem very simply. Here is an example from the lwpcook manpage:
      use HTTP::Request::Common qw(POST); use LWP::UserAgent; $ua = new LWP::UserAgent; my $req = POST 'http://www.perl.com/cgi-bin/BugGlimpse', [ search => 'www', errors => 0 ]; print $ua->request($req)->as_string;
      All you have to do is set up a POST request message with all the fields in your HTML form (check boxes, text fields, and so on), with the correct URL. I hope this helps.
      marcos
      If your page has (except username/password) buttons and checkboxes, then it's a cgi script. Just see which is the URL of the page called (or even better the form "action" as it might redirect you). Then call that URL (or the action script) with the params you want on the Location bar, thus simulating a form GET. Even if it posts the form with a PUT, it should accept the parameters on the url if it's written with CGI.pm (I don't know about others). If that doesn't work, then I' m afraid I cannot help.