debiandude has asked for the wisdom of the Perl Monks concerning the following question:

Hey! I am trying to get LWP to work with this website however its not going too well. I think I am able to login, but it then redirects me to the login page again.


Note: This is test data. I don't acutally use this login info.
Your User ID : petnuc@cooper.edu
Your Password : vhNVf0jH
URL : https://noii.nasdaqtrader.com


And here is my code:

#!/usr/local/bin/perl -w use strict; use LWP::UserAgent; use HTTP::Cookies; my $browser = new LWP::UserAgent; $browser->cookie_jar( HTTP::Cookies->new( 'file' => "$ENV{HOME}/.cookies.txt", 'autosave' => 1) ); my $url = 'https://noii.nasdaqtrader.com'; my $response = $browser->post($url, [ txtUserName => '*****', txtUserPass => '*****' ] ); while($response->is_redirect) { my $location = $url . $response->header("location"); $response = $browser->get($location); if($response->is_success) { print $response->content; } else { print $response->status_line; } }
I am not really sure what I am doing wrong and why it is not going onto the next page. I though it was a redirct issue cause when I first tried it I was getting 302 errors, but this approad is not working either. Any clues? Thanks.

Edited 2004-08-04 by Ovid -- hid username and password

Replies are listed 'Best First'.
Re: LWP and Site Logins
by Fletch (Bishop) on Aug 04, 2004 at 18:02 UTC

    OK, first off: IT IS EXTREMELY BONEHEADED TO POST YOUR LOGIN AND PASSWORD TO A PUBLIC WEB SITE.

    Ehem. Now that that's out of the way, you need to examine the reply and make sure you're not getting passed back some sort of cookie once you've successfully logged in. Since you're not saving it off most likely you're getting bounced back to the login page each time.

    Update: As Ovid remarked, stupid things have been done in the past (and will more than likely be done again in the future); better safe than getting someone cracked or rooted or what not . . . . At any rate, check out perldoc lwptut for info on how to save cookies, or check out WWW::Mechanize as [id://diatalevi] points out.

    Update 2: Wait, you do have a cookie jar (I missed that in my haste to chastize :). Hrmm, maybe I should just go take a nap.

      That would be a great reason to use WWW::Mechanize so it will just handle those details automatically. Either that or go fetch LWP::Simple::Cookies which I wrote before I knew about WWW::Mechanize. Now I don't bother with LWP and go right to W'M.
      I figured it wasnt a problem becuase its free to register and its not the login I am using. It was one I created just for this post. And about the cookie thing. I am really not sure how I would go about saving them or passing them. Anyway have a snippet or a link to them in the docs? Thanks

        In that case, it does seem reasonable that you posted a username and password. However, given how foolishly some people in the past have behaved here (posting real data), we frequently react quickly to remove such information. In the future, if you could clarify that it's test info, that would be helpful :)

        Cheers,
        Ovid

        New address of my CGI Course.

Re: LWP and Site Logins
by talexb (Chancellor) on Aug 04, 2004 at 18:17 UTC

    The following code fragment is something that works for me ..

    use LWP::UserAgent; use HTTP::Cookies; use HTTP::Request::Common; { my $ua = LWP::UserAgent->new; $ua->cookie_jar( HTTP::Cookies->new ); my $webPage = "https://noii.nasdaqtrader.com/"; my $res; # Log in. { my $uri = URI->new($webSite); $uri->query_form( 'txtUserName' => $username, 'txtPassword' => $password ); $res = $ua->get($uri); die "Unable to log in: " . $res->status_line unless ( $res->status_line =~ /200 OK/ ); } # Continue with logged in page .. }

    Also, note that a 302 is not really an error, it's more of an informational message. It means that the web server is re-directing you to another page.

    Alex / talexb / Toronto

    Life is short: get busy!

      Hrm. When I try that is seems to give me a 400 error.
        Correction. It didn't work but for a different reason. I had a type from the first try. When I tried you method with the uri post it still prints out the main page when I do a print $res->content.
Re: LWP and Site Logins
by Mr_Jon (Monk) on Aug 04, 2004 at 18:22 UTC
    If on Win32 you could try using the Win32::IE::Mechanize module and let IE take care of trickier issues like https and cookie handling:
    #! usr/bin/perl -w use strict; use Win32::IE::Mechanize; my $ie = Win32::IE::Mechanize->new(visible=>1); $ie->get('https://noii.nasdaqtrader.com'); $ie->form_name('Form1'); $ie->set_fields( txtUserName => 'email@wherever.com', txtUserPass => 'password' ); $ie->click('loginButton');
    You can dump the HTML at any time for parsing with the $ie->content method
      I am using unix. And although it seems these Mechanize modules are being rec'd I would like to get it working with just using the LWP modules now. Thanks though.
Re: LWP and Site Logins
by LTjake (Prior) on Aug 05, 2004 at 14:37 UTC

    An underused trick is to use LWP::Debug in your script. This will give you some indication as to what is going on during the request/response process.

    As you've noted, your script returns a 302 header, which is a redirect. LWP::UserAgent will automatically follow redirects on GET and HEAD requests. Adding POST to that list will get you to the proper page.

    Assuming you've entered a proper username and password you'll move on, other wise you'll be back at the login page.

    use strict; use warnings; use HTTP::Cookies; use LWP::UserAgent; use LWP::Debug qw( + ); my $url = 'https://noii.nasdaqtrader.com/'; my $agent = LWP::UserAgent->new( cookie_jar => HTTP::Cookies->new, requests_redirectable => [ 'GET', 'HEAD', 'POST' ] ); my $response = $agent->post( $url, { txtUserName => '***', txtUserPass => '***' } ); print $response->content;

    --
    "Go up to the next female stranger you see and tell her that her "body is a wonderland."
    My hypothesis is that she’ll be too busy laughing at you to even bother slapping you.
    " (src)

      I'm having strange behavior on several fronts.

      First, when I capture the actual web traffic, it's the same (almost) for both my browser, and for LWP, except that the packets captured by Ethereal when using the browser suddenly stop in the middle of the transmission. With LWP, you see the entire first web page; with the browser (Firefox 2), it ends like this:

      function yadda(... // check for redirection return redirectCheck(pluginFound, redire
      Consistently, it cuts off receiving after "redire" in Firefox, although it receives the entire TCP stream when using Internet Explorer. This is not a cache issue; I cleared the cache before trying with both browsers.

      Second, LWP handles the redirect automatically - as evidenced by the fact that, although the headers that Ethereal sees say

      HTTP/1.1 302 Object Temporarily Moved Connection: close Date: Sun, 01 Jul 2007 03:53:25 GMT Server: Microsoft-IIS/6.0 location: https://<...etc>
      LWP's $response->status_line returns only "200 OK". However, that "200 OK" response is not captured by Ethereal. My program receives it, but Ethereal claims it's never been received over port 80. AND, LWP does not receive the redirected page in the response. Neither does the response contain the original page. The response that LWP saves is some third page, neither the original, nor the page redirected to.

      Third, the web browser continues on to display the next webpage, even though no more traffic was captured by Ethereal.

      The code looks like this:

      my $ua = LWP::UserAgent->new( requests_redirectable => [ 'GET', 'HEAD', 'POST' ] ); ... $response = $ua->get($uri, @headers); # Handle redirects # This code never actually executes - LWP does it automati +cally # That's why you never see the redirect message while ($response->is_redirect) { my $location = $response->header("location"); print " "x$level . "Redirected to $location\n"; $response = $ua->get($location); } $page = $response->content; $success = $response->is_success; if (!$success) { print "LWP ERR: " . $response->status_line . "\n"; }
      Am I supposed to use LWP::Redirect? I find there is a module of that name, but no documentation mentions it.

        It's really hard to tell exactly what's going on -- but...

        http is port 80, and https is port 443 -- this may explain some things if all you're doing is monitoring port 80 (though I've never used ethereal myself).

        If you don't want LWP to auto-follow redirects, change the requests_redirectable line to

        requests_redirectable => []

        If you really do want to auto-follow redirects but it's not going to the same page, there may be some user-agent-based filtering happening on the server side (it's really hard to tell without seeing the actually output)

        Firefox's LiveHTTPHeades plugin is a really great way to see the request/response flow -- it may be a little easier to see what's going on with it instead.

        --
        "Go up to the next female stranger you see and tell her that her "body is a wonderland."
        My hypothesis is that she’ll be too busy laughing at you to even bother slapping you.
        " (src)