cketcham has asked for the wisdom of the Perl Monks concerning the following question:

All,

I have the following code:

use 5.010; use LWP::UserAgent; use URI::URL; use LWP::Debug qw(+); my $browser = LWP::UserAgent->new(); $browser->agent('Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .N +ET CLR 1.1.4322)'); my $url = url 'http://www.investway.com'; my $resp = $browser->get($url); open my $htmlFile, '>', 'c:\\temp\\temp5.html'; print $htmlFile $resp->{_content};
And here is the LWP::Debug output:

LWP::UserAgent::new: ()
LWP::UserAgent::request: ()
LWP::UserAgent::send_request: GET http://www.investway.com/
LWP::UserAgent::_need_proxy: Not proxied
LWP::Protocol::http::request: ()
LWP::Protocol::collect: read 160 bytes
LWP::UserAgent::request: Simple response: Found
LWP::UserAgent::request: ()
LWP::UserAgent::send_request: GET
http://www.investway.com/ErrorPage.aspx?aspxerrorpath=/Section.aspx
LWP::UserAgent::_need_proxy: Not proxied
LWP::Protocol::http::request: ()
LWP::Protocol::collect: read 784 bytes
LWP::Protocol::collect: read 2242 bytes
LWP::UserAgent::request: Simple response: Internal Server Error

These errors only happen with this particular web-site (www.investway.com). It works just fine when I use just about any other web site. There's really not enough information put out by LWP::Debug for me to figure this out.

By the way, the temp5.html file produced by my code shows a web page with "Runtime Error" as its title. Just using Internet Explorer to connect to www.investway.com works fine and shows a normal web page. So, I figure the LWP::UserAgent is not sending something needed by this website.
Any help would be greatly appeciated.

Chuck

Replies are listed 'Best First'.
Re: Trouble with LWP::UserAgent with certain website
by poolpi (Hermit) on Jul 29, 2008 at 05:49 UTC

    This website send you a cookie
    You can use something like that :

    my $ua = LWP::UserAgent->new( agent => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5 +.1; .NET CLR 1.1.4322)', cookie_jar => HTTP::Cookies->new( file => 'cookies.txt', autosave => 1, ignore_discard => 1 ) );

    UPDATE
    I tested the script and your problem comes from the ASP.NET_SessionId (see below)

    #!/usr/bin/perl -w use strict; use LWP::UserAgent; use LWP::Debug qw(+); my $ua = LWP::UserAgent->new(); my $url = q{http://www.investway.com}; my @headers = ( 'User-Agent' => 'User-Agent=Mozilla/5.0 (X11; U; Linux x86_64; en- +US; rv:1.8.1.14) Gecko/20080404 Iceweasel/2.0.0.14 (Debian-2.0.0.14-2)', 'Accept-Language' => 'Accept-Language=en-us,en;q=0.5', 'Accept-Charset' => 'Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0. +7', 'Accept-Encoding' => 'Accept-Encoding=gzip,deflate', 'Accept' => "image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, +image/png, */*", Cookie => 'ASP.NET_SessionId=l0gfoxzlneisjjfimjng23v1', ); my $res = $ua->get( $url, @headers ); if ( $res->is_success ) { print $res->headers_as_string, "\n"; } else { print $res->status_line . "\n"; Output: LWP::UserAgent::new: () LWP::UserAgent::request: () LWP::UserAgent::send_request: GET http://www.investway.com LWP::Protocol::http::request: () LWP::Protocol::collect: read 552 bytes LWP::Protocol::collect: read 570 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1368 bytes LWP::Protocol::collect: read 1028 bytes LWP::UserAgent::request: Simple response: OK Cache-Control: private Date: Tue, 29 Jul 2008 09:34:01 GMT Server: Microsoft-IIS/6.0 Content-Length: 19934 Content-Type: text/html; charset=utf-8 Client-Date: Tue, 29 Jul 2008 09:33:51 GMT Client-Peer: 10.154.68.6:8080 Client-Response-Num: 1 Link: <Includes/PropertyDetail.css>; /="/"; rel="stylesheet"; type="te +xt/css" Link: <Includes/thickbox.css>; /="/"; media="screen"; rel="stylesheet" +; type="text/css" Link: <Images/favicon.ico>; /="/"; rel="shortcut icon" Title: Investway X-AspNet-Version: 2.0.50727 X-Powered-By: ASP.NET

    hth,
    PooLpi

    'Ebry haffa hoe hab im tik a bush'. Jamaican proverb
      I tried your new script and got this result:

      LWP::UserAgent::new: ()
      LWP::UserAgent::request: ()
      LWP::UserAgent::send_request: GET http://www.investway.com
      LWP::UserAgent::_need_proxy: Not proxied
      LWP::Protocol::http::request: ()
      LWP::Protocol::collect: read 160 bytes
      LWP::UserAgent::request: Simple response: Found
      LWP::UserAgent::request: ()
      LWP::UserAgent::send_request: GET http://www.investway.com/ErrorPage.aspx?aspxerrorpath=/Section.aspx
      LWP::UserAgent::_need_proxy: Not proxied
      LWP::Protocol::http::request: ()
      LWP::Protocol::collect: read 784 bytes
      LWP::Protocol::collect: read 2242 bytes
      LWP::UserAgent::request: Simple response: Internal Server Error
      500 Internal Server Error


      I am on a Windows XP platform. Also, I can block cookies in Internet Explorer and the web page still loads correctly in the browser window, so I am wondering why cookies would be significant. Also, could you please explain a little how you figured out cookies were being used by the web page?

        Have you read my update?

        More informations:
        - Check your http headers with a Firefox add-on, Tamper for example.
        - And you may read this article about ASP.NET session.

        Good luck ;)

        hth,
        PooLpi

        'Ebry haffa hoe hab im tik a bush'. Jamaican proverb