Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling

Re^4: Web Spidering Ajax Sites

by sgt (Deacon)
on Sep 18, 2007 at 13:26 UTC ( #639637=note: print w/replies, xml ) Need Help??

in reply to Re^3: Web Spidering Ajax Sites
in thread Web Spidering Ajax Sites

Yes I agree completely with your second paragraph but the point I was trying to make was that the OP was asking about how to deal with javascript, and possibly what extra needed to be done with AJAX.

So if your web-scraper wants to deal with content (for some definition of web scraping), what do you do if the server sends back some kind of serialized data that only a true js engine can decode...

cheers --stephan

Replies are listed 'Best First'.
Re^5: Web Spidering Ajax Sites
by perrin (Chancellor) on Sep 19, 2007 at 03:40 UTC
    If the service returns JSON, it would be much easier to parse than HTML. Normally it sends back some HTML that only a true HTML engine can decode, but that doesn't stop up from grabbing the current weather out of it. If you are trying to test your JavaScript, you need something like Selenium. If you just want to scrape a site, you don't need JavaScript, even for an Ajax site.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://639637]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others having an uproarious good time at the Monastery: (2)
As of 2023-10-04 23:04 GMT
Find Nodes?
    Voting Booth?

    No recent polls found