thekestrel has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,
I am trying to retrieve some data from a web page that implements the following DEFER method.
<script language="javascript1.1" defer src="a_script.js">

This is similar to what you see when searching a site like Expedia that displays an interum page telling you its off looking for hotels or flights or whatever whilst loading the results.
The DEFER keyword tells the browser to continue processing the page because there is no displayed output from it. This means that my script captures the interum page, sees the DEFER keyword and finish drawing the interum page and finishes. This does not seem to be the case however as the completion of this script displays the results. This means that I can't get to the final data.
I tried downloading the script separately and inserting it into the page inside <SCRIPT> headers whilst removing the script line with the DEFER line in it in an attempt to have the script run on one page, but that did not seem to work.
Have anyone had any experience with getting data in this fashion and can offer some advice?

Regards Paul

Replies are listed 'Best First'.
Re: Retrieving Deferred Content
by ikegami (Patriarch) on Jun 02, 2005 at 18:33 UTC
      Ikegami,
      Thanks for the response that does look like its in the right neck of the woods. Would you be able to explain why Win32::IE::Mechanize would address my issue in comparison with WWW::Mechanize?
      Win32::IE::Mechanize requires libwin32 which does not seem trivial to install at all and it sounds like it requires a windows C/C++ compiler to make some of the dependancies.

      Regards Paul

        WWW::Mechanize adds HTML parsing to LWP's understanding of HTTP. Neither understands JavaScript. Win32::IE::Mechanize, on the other hand, is an interface to Internet Explorer. It asks IE to fetch a page and process it, including any JavaScript in it.

        Do a search of this site for Mechanize and/or JavaScript to read more on the subject.

Re: Retrieving Deferred Content
by jhourcle (Prior) on Jun 02, 2005 at 19:52 UTC

    It sounds to me that someone's breaking the standards, if they're using a deferred script to generate content.

    Your best bet would be to educate the site's maintainers into writing correct HTML, and not using various quirks in browser implementations to try to accomplish something that they should have done with multipart mime, or some other form of server push.

    (although, quite a few browsers choke on multipart mime, like 1.x versions of Safari, which only shows the final part, and not the earlier stages)

    Of course, you've never shown how it is that you're attempting to scrape the page, which could make this a perl related question, and might provide a basis for people to suggest refinements. (sure, you've explained it in words, but there's no code example).