mrguy123 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks

I am working on a tool that retrieves metadata from multiple web sites and displays the data together for University students. I usually retrieve the data via LWP::UserAgent, and then parse the relevant info.

Lately, I have encountered web sites that generate their data dynamically (I assume using Ajax and Javascript). When I view the source of these sites via FireFox, I don't get any relevant info. When I click "View Generated Source", I get the data I need. Here is an example of such a site (just do a search and view the source).

My question is, is it possible to fetch the generated source via Perl? I am encountering more and more of these sites and am completely stuck when try to parse them

Any ideas?
Thanks,
MrGuy


Artificial Intelligence stands no chance against Natural Stupidity.

Replies are listed 'Best First'.
Re: Fetch generated source
by Ovid (Cardinal) on Oct 23, 2011 at 10:20 UTC

    First, check the terms and conditions for the Webs site you're concerned about. If they allow this behavior, most modern browsers give you tools to track the AJAX calls. In this case, I used Chrome, right-clicked on the window and chose "Inspect Element" (or "alt-cmd-i"). From there, click the "Network" tab and make your search and you'll see the traffic.

    When I searched for "Perl", I saw a GET request was issues to this search URL and it returned an easily parseable XML document. From there, it's up to you.

      Thanks for the tip. I had found this URL before but tried to access it after percent decoding it which was the wrong way to go.
      I hope this method will also help me with other sites
Re: Fetch generated source
by Corion (Patriarch) on Oct 23, 2011 at 10:19 UTC
      Thanks for the suggestions, I will look into these modules to see if they give me an answer to my problem