in reply to Web scraping toolkit?
Personally, I also wrote App::scrape to hide away my extraction library consisting of HTML::TreeBuilder::XPath and HTML::TokeParser.
But that library only deals with convenient extraction from HTML, not with the navigation etc.
I like the navigation and extraction API of WWW::Mechanize::Firefox, which is mostly a combination of the APIs of HTML::TreeBuilder::XPath and the API of WWW::Mechanize. Most likely, this sympathy is because I'm the author of that module.
The best approach to a simplicistic boilerplate approach I've seen is Querylet, which is a source filter that describes DBI reports. Maybe you can reformulate your extractions in a language like it. I wrote (but never used in production so far) a source-filter-less, pluggable version of Querylet at https://github.com/Corion/querylet/tree/pluggable, so if you dislike source filters but like the general language format, you can maybe reuse that parser instead.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Web scraping toolkit?
by mzedeler (Pilgrim) on Jan 27, 2012 at 08:44 UTC |