Re: Parsing AJAX-based website

My advice is to ignore the AJAX layer and target the lower-level HTTP requests going from the browser to the server. You can capture these using a proxy like HTTP::Recorder or something like the FireBug plugin.

If you're lucky you'll find that underneath the glitzy AJAX there's a relatively simple protocol - the browser POSTs some data (user/pass) and gets back some JSON or XML indicating the result (login succeeded or failed). You can then use WWW::Mechanize to imitate that protocol and extract the info you need. If the site authors did a good job this can actually be easier than scraping an HTML page.

Good luck!

-sam

Comment on Re: Parsing AJAX-based website

Replies are listed 'Best First'.
Re^2: Parsing AJAX-based website by Anonymous Monk on Feb 25, 2013 at 12:48 UTC
I am trying to parse a website and the site loads a basic page that I can get with mechanize-get($url). But the Tags and Page data that I want to get is not found content. I think its an ajax call. I do not know how to get my script to get the tags. When I look at google chrome(element inspect), ie(F12 key), mozilla firebug all of them show the tags that I am looking for. But how do I get to the tags. Help will be appreciated.	[reply]
Re^3: Parsing AJAX-based website by marto (Cardinal) on Feb 25, 2013 at 12:52 UTC
WWW::Mechanize states that it does not support JavaScript. Use something that does. Using WWW::Selenium To Test Or Automate An Ajax Website, WWW::Mechanize::Firefox, Super Search.	[reply]
Re^3: Parsing AJAX-based website by tmharish (Friar) on Feb 25, 2013 at 12:52 UTC
Look at the Net tab in firebug or 'Network' tab on chrome and find all the requests going out. Identify the request you are interested in - find out how that part is constructed ( based on the AJAX code that is present ) and make that call. As a note some of the modules mentioned above do this for you.	[reply]