in reply to Parsing AJAX-based website

My advice is to ignore the AJAX layer and target the lower-level HTTP requests going from the browser to the server. You can capture these using a proxy like HTTP::Recorder or something like the FireBug plugin.

If you're lucky you'll find that underneath the glitzy AJAX there's a relatively simple protocol - the browser POSTs some data (user/pass) and gets back some JSON or XML indicating the result (login succeeded or failed). You can then use WWW::Mechanize to imitate that protocol and extract the info you need. If the site authors did a good job this can actually be easier than scraping an HTML page.

Good luck!

-sam

Replies are listed 'Best First'.
Re^2: Parsing AJAX-based website
by Anonymous Monk on Feb 25, 2013 at 12:48 UTC
    I am trying to parse a website and the site loads a basic page that I can get with mechanize-get($url). But the Tags and Page data that I want to get is not found content. I think its an ajax call. I do not know how to get my script to get the tags. When I look at google chrome(element inspect), ie(F12 key), mozilla firebug all of them show the tags that I am looking for. But how do I get to the tags. Help will be appreciated.

      Look at the Net tab in firebug or 'Network' tab on chrome and find all the requests going out.

      Identify the request you are interested in - find out how that part is constructed ( based on the AJAX code that is present ) and make that call.

      As a note some of the modules mentioned above do this for you.