Does a memory-efficient alternative exist that writes the content to files and incrementally parses the content?
I ran into a similar problem a while back. I wrote WWW::Crawler::Lite as a result. You supply the `on_response` handler which you could use to do anything you want (write the response to disk, parse later/incrementally, etc). You'll be starting a bit closer to the ground on this one (compared to WWW::Mechanize) but its memory footprint is tiny compared to 'Mech.
You can see an example spider which crawls search.cpan.org in the t/ folder of the module http://cpansearch.perl.org/src/JOHND/WWW-Crawler-Lite-0.003/t/010-basic/
In reply to Re: Are there any memory-efficient web scrapers?
by jdrago999
in thread Are there any memory-efficient web scrapers?
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |