in reply to Re: lwp not retieving the same page as from a browser
in thread lwp not retieving the same page as from a browser

pull down the page = what is returned from the Get request

I turned off Javascript on the browser and the 'missing' data is present in the returned page (i.e. it has no effect on what gets returned; it still gets more data than via perl).

I guess the best way I can describe this is:

Browser:
  Headers + Get => AxyzB

Perl:
  Headers + Get => AxB

where ABxyz are sections of HTML returned. xyz are sections associated with the tabbed areas.

I am sending the same headers in perl (as far as I know) that were sent/shown via rexswain.com.

  • Comment on Re^2: lwp not retieving the same page as from a browser

Replies are listed 'Best First'.
Re^3: lwp not retieving the same page as from a browser
by james2vegas (Chaplain) on Aug 26, 2009 at 08:01 UTC
    If I change your code to this (changing your User-Agent to the one used by rexswain.com), and using the normal call to set user-agent, viz:
    use LWP::Simple; use LWP::UserAgent; $browser = LWP::UserAgent->new(); $browser->agent('Mozilla/5.0 (X11; U; OpenBSD i386; en-US; rv:1.8.1.22 +) Gecko/20090626 SeaMonkey/1.1.17 XpcomViewer/0.9'); $response = $browser->get('http://brtweb.phila.gov/brt.apps/Search/Sea +rchResults.aspx?id=6546003202'); print $response->content;

    I then get the same amount of lines and text as rexswain.com does, I have not verified the content, can you check? Using your User-Agent string returns a 41437-byte response, and the rexswain User-Agent (used above) returns 43314 bytes, which is the same as the rexswain.com form returns. Perhaps sending Mozilla/4.0 instead of 5.0 was triggering some code path on their ASP code you would not see otherwise.
      Yes, that did the trick! I would never have thought of that.

      Thank you for your help.