TacoVendor has asked for the wisdom of the Perl Monks concerning the following question:

I haven't figured out the easiest way to describe what I need to do so that it is clear, but I will do my best here. Easiest to start with explaining what works I guess.

Using a web browser, I am browsing a site that is serving up pages using php. Some links trigger a server side script that pushes a url back down to the browser with an instant page refresh. I call 'server.com/page1.php?script' the browser next displays a url of 'server.com/results.php=xxx'.

I can move perl through calling web pages and such without a problem, I just have no idea how to get perl to see the results that are pushed down like this from a server side script.

This specific script is running right now on ActivePerl 5.8.x on Win32. If I can get it working properly here I can figure out how to port it to the other versions/os's that I need to work with.

If more clarification is needed then ask in a reply and I will give whatever info I can.

Thanks.

Replies are listed 'Best First'.
Re: Browsing php pages properly?
by ikegami (Patriarch) on Feb 01, 2006 at 16:21 UTC

    If I understand correctly, you're trying to downloade web pages. One (or more) of them redirects you to another page, and you want to know how to download the page to which you've been redirected.

    LWP should already do this for you. Compare the request and simple_request methods of LWP::UserAgent.

    However, LWP will only do this if the redirect is an HTTP redirect. If the redirect is done via HTML's META tags or via JavaScript, LWP cannot help you.

    WWW::Mechanize might process HTML META tags, but it definitely will not do JavaScript either.

    If the PHP emits JavaScript to perform the redirection, you might want to consider Win32::IE::Mechanize.

      It is javascript that is doing the redirect.

      My issue is focused around being able to see what url the server has pushed. If the only option is to use the IE::Mechanize module to launch an IE instance to see the pushed url then so be it, but is there by chance a way to have perl 'sit back and wait' like a browser to accept data from a server side push like this?

        No, because LWP doesn't look at the response body, and because WWW::Mechanize doesn't pass the JavaScript to a JavaScript engine. I think there's is a project to create a JavaScript engine in Perl, but I don't know its status.

        Now, if you're only dealing with a single .php (or a set that behave identically), you could search the JavaScript code for the URL using a simple regexp, then fetch that page yourself.