in reply to Parsing AJAX-based website

How do I simulate this to get to the information I want to extract actually?

With great patience and attention to detail.

It's just about impossible in the general case. Have a look at the WWW::Mechanize::FAQ, on the section "why don't you support JavaScript?". In a nutshell, you need to reimplement a JavaScript engine, and that's a non-trivial undertaking.

The usual way of going about automating a website that makes heavy use of JavaScript is to insert an HTTP::Proxy-based proxy between your client and the server, to record precisely what is being passed back and forth. At the lowest level, it's just GET, POST and following redirects, and doing the same thing in your program.

But depending on the site, this can be very difficult to do.

• another intruder with the mooring in the heart of the Perl

Replies are listed 'Best First'.
Re^2: Parsing AJAX-based website
by Joost (Canon) on Feb 09, 2008 at 17:16 UTC
    The javascript engine isn't even that big of a problem. There are at least two modules on CPAN that interface with the spidermonkey JS interpreter (though last time I checked, neither was complete and using them wasn't exactly trivial).

    The real issue is that you'd have to implement pretty much the whole DOM too, including most of the non-standardized stuff. Which is really, really tedious work.