shakuni has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to scrape data from a website that fetches that data using ajax. similar to google tasks.. How can I do this? Please gimme some hint.

Replies are listed 'Best First'.
Re: How to scrape data from ajax calls
by Corion (Patriarch) on Jan 08, 2009 at 14:05 UTC

    "Ajax" transfers the data over HTTP, just like regular web pages. So, just use whatever you use to scrape data from regular web pages. You will need to treat the results a bit differently - if JavaScript is returned, you will need to interpret it from your script.

    Alternatively, you can try to automate the website from the outside, by using, for example, Win32::IE::Mechanize, and then capturing the traffic using Sniffer::HTTP. Where exactly do you have problems?

Re: How to scrape data from ajax calls
by marto (Cardinal) on Jan 08, 2009 at 14:07 UTC
      By google tasks, I mean this. I'm trying to auto login into gmail and then scrape all the tasks from there. Unsuccessful yet :(
        shakuni,
        Ok, you have been given a hint (Using WWW::Selenium To Test Or Automate An Ajax Website). Why don't you start by saying what you have and have not accomplished.
        • Successfully Loging
        • Click the "tasks" link
        • Find the tasks pop-up window
        • Fetch the HTML of the tasks pop-up window
        • Parse the HTML to the desired end

        In other words - show some effort and give us something more to go on than "it doesn't work".

        Cheers - L~R

Re: How to scrape data from ajax calls
by locked_user sundialsvc4 (Abbot) on Jan 08, 2009 at 14:48 UTC

    Well, I see about 219 CPAN packages for “GMail” at search.cpan.org, and 548 for “Google,” so perhaps you could start there...

    Remember:   “DRY = Don't Repeat Yourself.” In fact, don't repeat anyone in the world if you can help it.

    You can be absolutely sure that you are not the first person to have worked on getting useful information from Google or GMail. You can also be sure that, as soon as someone's put together a decent and general-purpose “way to do that,” it's going to show up on CPAN. Therefore, practical software-development in the Perl world consists very heavily of searching for, discovering, and then leveraging existing well-tested software assets from CPAN and other sources. Your task is surely no exception. There is absolutely nothing about “dealing with AJAX, either as a client or as a server,” that you must “invent.”

    This way of thinking does take some getting used to, because in the academic world “borrowing somebody else's work” is called “cheating.”