in reply to Are there any memory-efficient web scrapers?

Does a memory-efficient alternative exist that writes the content to files and incrementally parses the content?

I ran into a similar problem a while back. I wrote WWW::Crawler::Lite as a result. You supply the `on_response` handler which you could use to do anything you want (write the response to disk, parse later/incrementally, etc). You'll be starting a bit closer to the ground on this one (compared to WWW::Mechanize) but its memory footprint is tiny compared to 'Mech.

You can see an example spider which crawls search.cpan.org in the t/ folder of the module http://cpansearch.perl.org/src/JOHND/WWW-Crawler-Lite-0.003/t/010-basic/

  • Comment on Re: Are there any memory-efficient web scrapers?