Hi All, I'm trying to script command-line scraping of a website, a vendor website hosted at my company. There are many levels of redirection one must go through after login, and, while Firefox and Chrome can handle it, LWP seems to generate "Bad Request" responses. After the "SetSessionVars.php" request (below), it returns a bad request response, whereas in the browser it successfully redirectos to the home page. For the life of me I can't figure out what I'm not doing. Here's my code:

my $ua = LWP::UserAgent->new(); push @{ $ua->requests_redirectable }, 'POST'; my $cookies = new HTTP::Cookies(file=>'/Users/jcabraham/.cookies.txt', +autosave=>1, ignore_discard=>1); $ua->cookie_jar($cookies); $ua->default_header('Accept-Encoding' => scalar HTTP::Message::decodab +le()); $ua->add_handler("request_send", sub { shift->dump; return }); $ua->add_handler("response_done", sub { shift->dump; return }); # log off first, just start clean my $auth_response = $ua->request(GET "http://ap1492-dsr/LogOff.php"); # now login my $response = $ua->request(POST "http://ap1492-dsr/authenticate.php", + [user => $authUser, password => $authPw, TimezoneOffset => 14400, su +bmit => 'User Login']); # scrape home page $response = $ua->request(GET "http://ap1492-dsr/Welcome.php"); if ($response->is_success) { my $html = $response->decoded_content; print $html; }

And here's the trace output from LWP:

macbook:scripts jcabraham$ link_aperio.pl 12 12 GET http://ap1492-dsr/LogOff.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342557122; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:13 GMT Pragma: no-cache Location: Login.php Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:13 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/Login.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342557122; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 200 OK Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:13 GMT Pragma: no-cache Server: Apache Content-Length: 5078 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 Link: <./CSS/masterstyle.css?11.1.1.760>; rel="stylesheet"; type="text +/css" Link: <./CSS/blue.css?11.1.1.760>; rel="stylesheet"; type="text/css" Link: <./CSS/blueLogin.css?11.1.1.760>; rel="stylesheet"; type="text/c +ss" Link: <./CSS/custom.css?11.1.1.760>; rel="stylesheet"; type="text/css" Refresh: text/html Set-Cookie: memory_limit=deleted; expires=Wed, 20-Jul-2011 20:23:12 GM +T; path=/ Set-Cookie: PHPSESSID=1342729393; path=/ Set-Cookie: PHPSESSID=681877b8eaa1b7fd3a35cc9db713cfa7; path=/ Set-Cookie: PHPSESSID=1342557122; path=/; httponly Title: Spectrum - Login X-Powered-By: PHP/5.3.5 \r <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR +/html4/loose.dtd"><html><head><meta content='text/html' http-equiv='r +efresh'> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><TI +TLE>Spectrum - Login</TITLE> <link type='text/css' rel='stylesheet' href='./CSS/masterstyle.css?11. +1.1.760'> <script type='text/javascript' src='./Spectrum.js?11.1.1.760'> </scrip +t> <script type='text/javascript' src='./Keyboard.js?11.1.1.760'> </scrip +t> <script type='text/javascript' src='.... (+ 4566 more bytes not shown) POST http://ap1492-dsr/authenticate.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Content-Length: 70 Content-Type: application/x-www-form-urlencoded Cookie: PHPSESSID=1342557122; DontShowDisclaimer80=1 Cookie2: $Version="1" user=jabraham&password=da!syd0g&TimezoneOffset=14400&submit=User+Login HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: Disclaimer.php Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 Set-Cookie: PHPSESSID=1342729394; path=/ X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/Disclaimer.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: DetermineRole.php Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/DetermineRole.php Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: DetermineHierarchy.php?RoleId=102&HierarchyId=3 Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/DetermineHierarchy.php?RoleId=102&HierarchyId=3 Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 302 Found Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre- +check=0 Connection: close Date: Thu, 19 Jul 2012 20:23:14 GMT Pragma: no-cache Location: ../SetSessionVars.php?RoleId=102&HierarchyId=3 Server: Apache Content-Length: 0 Content-Type: text/html; charset=UTF-8 Expires: Thu, 19 Nov 1981 08:52:00 GMT Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 X-Powered-By: PHP/5.3.5 (no content) GET http://ap1492-dsr/../SetSessionVars.php?RoleId=102&HierarchyId=3 Accept-Encoding: gzip, x-gzip, deflate, x-bzip2 User-Agent: libwww-perl/5.837 Cookie: PHPSESSID=1342729394; DontShowDisclaimer80=1 Cookie2: $Version="1" (no content) HTTP/1.1 400 Bad Request Connection: close Date: Thu, 19 Jul 2012 20:23:15 GMT Server: Apache Content-Length: 286 Content-Type: text/html; charset=iso-8859-1 Client-Date: Thu, 19 Jul 2012 20:23:14 GMT Client-Peer: 10.100.50.80:80 Client-Response-Num: 1 Title: 400 Bad Request <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.< +br /> </p> <hr> <address>Apache Server at ap1492-dsr Port 80</address> </body></html>


In reply to LWP fails where browser succeeds? by jcabraham

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.