in reply to Browsing php pages properly?

If I understand correctly, you're trying to downloade web pages. One (or more) of them redirects you to another page, and you want to know how to download the page to which you've been redirected.

LWP should already do this for you. Compare the request and simple_request methods of LWP::UserAgent.

However, LWP will only do this if the redirect is an HTTP redirect. If the redirect is done via HTML's META tags or via JavaScript, LWP cannot help you.

WWW::Mechanize might process HTML META tags, but it definitely will not do JavaScript either.

If the PHP emits JavaScript to perform the redirection, you might want to consider Win32::IE::Mechanize.

Replies are listed 'Best First'.
Re^2: Browsing php pages properly?
by TacoVendor (Pilgrim) on Feb 01, 2006 at 16:40 UTC
    It is javascript that is doing the redirect.

    My issue is focused around being able to see what url the server has pushed. If the only option is to use the IE::Mechanize module to launch an IE instance to see the pushed url then so be it, but is there by chance a way to have perl 'sit back and wait' like a browser to accept data from a server side push like this?

      No, because LWP doesn't look at the response body, and because WWW::Mechanize doesn't pass the JavaScript to a JavaScript engine. I think there's is a project to create a JavaScript engine in Perl, but I don't know its status.

      Now, if you're only dealing with a single .php (or a set that behave identically), you could search the JavaScript code for the URL using a simple regexp, then fetch that page yourself.

        I cannot search the java code for the url since the returned url is generated at the time of calling the link from the 1st page. The data after the '=' in the url is used to generate the page itself when called by the browser.

        After looking over the IE::Mech module, that looks like it will work for what I need. The integration of the OLE module will let me grab the pushed url and I should be ok at that point.

        ikegame, thanks for the info, especially the reference to IE::Mech. I hadn't come across that one before and would not have even thought about doing this that way.