monkster has asked for the wisdom of the Perl Monks concerning the following question:

I had been using Perl modules like WWW::Mechanize to get html pages.. but i am not able to get pages that has dynamic content or AJAX based pages.. pl suggest me ways to retrieve those pages.. and also how to solve javascripts in perl?? thanks...

Replies are listed 'Best First'.
Re: how to get dynamic html pages??
by blue_cowdawg (Monsignor) on Jan 25, 2008 at 07:07 UTC
        pl suggest me ways to retrieve those pages..

    Not sure what you mean by that Dear Monk. The technology used to generate a web page bears no relevance to the ability of a browser or browser-alike to retrieve it.

    The simplest example I can think of to accomplish this comes right from the man page of LWP::UserAgent:

    require LWP::UserAgent; my $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->env_proxy; my $response = $ua->get(’http://some.url.com/’); if ($response->is_success) { print $response->content; # or whatever } else { die $response->status_line; }


    Peter L. Berghold -- Unix Professional
    Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
      for example http://beta.nasdaq.com/ page got thru LWP::Useragent doesn ve all the data that u can see on screen.. think it uses lotta dynamic data.. i guess completely emulating a browser (like firefox) in executing the page may help.. can u pl propose some ways for that..
            can u pl propose some ways for that..

        Conceptually what you need to do is parse the HTML you have read in on a page, look for URIs pointing to other resources, fetch those, act on those, lather, scrub , rinse, repeat as necessary....


        Peter L. Berghold -- Unix Professional
        Peter -at- Berghold -dot- Net; AOL IM redcowdawg Yahoo IM: blue_cowdawg
Re: how to get dynamic html pages??
by pc88mxer (Vicar) on Jan 25, 2008 at 08:03 UTC
    I would use Firefox itself to do this. I remember that Netscape was somewhat controllable from the command line - you could tell the currently running instance to load a certain page, save a page, etc. If Firefox is controllable in the same fashion I would look into that kind of solution.

    Otherwise you can probably do what you want to do with javascript if you can bypass the "same origin policy". Just write a page that loads your target page into another window and write the javascript you need to emulate button clicks, text entry, etc.

    Finally, you might be able to find a Firefox plug-in which does this, or you can write your own plug-in.

      yeah.. this is possible.. but how can we load a html page in firefox from perl?? plus usage of gui is not preferred.. i ll check out that javascript thing though.. thanks :)