in reply to Help on Crawling
It all depends on what you really want to do with the site(s). If you need to fill in forms and generally interact with a dynamic system, WWW::Mechanize is the best choice. LWP::UserAgent is a bit more low-level. WWW::Mechanize is actually a superclass of LWP::UserAgent, so you can still use all the tricks LWP::UserAgent can do with WWW::Mechanize, but you'll have a (slight) performance hit because WWW::Mechanize already parses the HTML for forms and links even if you don't need that information.
IIRC HTML::TokeParser doesn't do HTTP retrieval, so on its own it's not enough to craw web-pages.
I'd probably recommend WWW::Mechanize, unless you have a really specific use for your spider that doesn't fit WWW::Mechanize, and you need the performance benefits.
|
|---|