http://qs1969.pair.com?node_id=639109


in reply to Re^2: Web Spidering Ajax Sites
in thread Web Spidering Ajax Sites

That information is correct, but totally irrelevant. Mech has no support for JavaScript, but the server doesn't know that. If you wanted to actually execute some JavaScript code, Mech can't do it, but all you want to do is talk to the server as if you were a browser (with JavaScript), and Mech can do that.

There is nothing that JavaScript can make a browser send to the server that you can't mimic with Mech. The only hard part is figuring out exactly what the JavaScript would send, and using HTTP::Recorder with your browser (or using some other means of looking at the requests, like LiveHTTPHeaders) solves that for you.

Replies are listed 'Best First'.
Re^4: Web Spidering Ajax Sites
by sgt (Deacon) on Sep 18, 2007 at 13:26 UTC

    Yes I agree completely with your second paragraph but the point I was trying to make was that the OP was asking about how to deal with javascript, and possibly what extra needed to be done with AJAX.

    So if your web-scraper wants to deal with content (for some definition of web scraping), what do you do if the server sends back some kind of serialized data that only a true js engine can decode...

    cheers --stephan
      If the service returns JSON, it would be much easier to parse than HTML. Normally it sends back some HTML that only a true HTML engine can decode, but that doesn't stop up from grabbing the current weather out of it. If you are trying to test your JavaScript, you need something like Selenium. If you just want to scrape a site, you don't need JavaScript, even for an Ajax site.