Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have a script that is using LWP::UserAgent to parse a web page. I've used to script to parse web pages in the past it has worked fine. I'm currently trying to retrieve a web page that is generated using php. (ie www.webpage.com/webpage.php). The web page in question culls and displays information from a database. When browsing the page through IE, the top of the page (simple text), displays very quickly but then it takes some time for the information to be pulled from the database and displayed. There is a delay between when you see the top part of the page and when the rest of the page loads.

My question is this: When I try to retrieve the web page through perl, all the "text" information is displayed properly, but the information that is pulled from the database is not retrieved by the script. All that my script is doing, is grabbing the web page:
sub getURL { my($url, $thegoods,$givecookie,$savecookie,$redirect) = @_; my $ua=new LWP::UserAgent; my $request; $request = new HTTP::Request('GET',$url); my $response=$ua->request($request); print $response->as_string(); }

My script runs and finishes very quickly. It takes much longer for the page to load in IE than for my script to run. It is almost as if the webpage is not retrieving the database information.
I know this questions vague, but does anybody have any idea why the information that is being pulled from the database is not being picked up by script?
Thanks

Replies are listed 'Best First'.
Re: Parse PHP Web Page
by powerhouse (Friar) on Feb 09, 2003 at 07:55 UTC
    Well, generally speaking, server to server will always be much faster, since your not using your ISP's connection, and most servers in Data Centers have a HUGE connection on them, generally much faster then a home/business connection.


    It could be that they are using some sort of "browser" test. So you might fake the browser type this way:
    sub getURL { my($url, $thegoods,$givecookie,$savecookie,$redirect) = @_; my $ua=new LWP::UserAgent; $ua->agent("Mozilla/8.0"); #pretend we are capable browser my $request; $request = new HTTP::Request('GET',$url); my $response=$ua->request($request); print $response->as_string(); }


    Try that and see if it works.

    If that don't work, you might FAKE the referal by adding this line:
    $request->referrer("$site_url"); # get their home page and put it in t +hat string...
    thx,
    Richard
Re: Parse PHP Web Page
by tachyon (Chancellor) on Feb 09, 2003 at 12:00 UTC

    Another possibility is that they are using a page refresh so you get a holding page when you hit the database which is then updated with a fresh page when the data is available. It is also possible that the data is being sent to a hidden frame and then moved into the page proper with a javascript.

    If you can show us the page no doubt we can show you how to get the data.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: Parse PHP Web Page
by steves (Curate) on Feb 09, 2003 at 07:11 UTC

    They might be using server push to deliver the page in multiple pieces. I've never handled server push with LWP, but I believe you can do it with a more complicated piece of LWP code that uses call-backs.