in reply to Links for Screen Scraping

WWW::Mechanize is a subclass of LWP::UserAgent. All methods of LWP::UserAgent are available with WWW::Mechanize too, so posting works the same as with LWP::UserAgent. In most cases, the POST request is done by using the WWW::Mechanize click method, or the WWW::Mechanize submit method (if there is no button to click. In most cases, reading the WWW::Mechanize documentation should prove helpfull when looking for interesting methods.

For some more specialized scraping modules, take a look at the WWW::Search:: module namespace and the Finance::Banking:: module namespace.

I haven't found any problems in using WWW::Mechanize for all my scraping needs, together with HTML::TableExtract to scrape stuff out of HTML tables afterwards.

Unless you implement a DOM, you will have to interpret the Javascript on the pages yourself and convert the Javascript to Perl code manually.

Replies are listed 'Best First'.
Re: Links for Screen Scraping
by Abigail-II (Bishop) on May 26, 2004 at 10:52 UTC
    Sure, you can call the inherited post(), but that doesn't give you the benefits of WWW::Mechanize. WWW::Mechanize keeps your current page, so you can follow links, fill out forms, etc. By calling an inherited function, you cut out the valuable middle man.

    The reason WWW::Mechanize doesn't have (or need) a post method is that browsers also don't have functionality for users to type in their POST requests. POST requests are typically done by submitting forms - and WWW::Mechanize has good functionality for that.

    Abigail