How can I extract all the information on a webpage to my database?

ssnewbie has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: How can I extract all the information on a webpage to my database? by Fletch (Bishop) on May 09, 2005 at 20:57 UTC
Perl and LWP (ISBN 0596001789) covers everything you'll need to know to do web scraping. Update: Actually, I don't think it covers WWW::Mechanize, so amend that to "covers almost everything".	[reply]
Re: How can I extract all the information on a webpage to my database? by mda2 (Hermit) on May 09, 2005 at 21:04 UTC
One great book to guide your work is Data Munging with Perl, see PerlMonks review . More than learn abount some modules, with this book you learn to extract and transform data with Perl. -- Marco Antonio Rio-PM	[reply]
Re: How can I extract all the information on a webpage to my database? by davidrw (Prior) on May 09, 2005 at 21:00 UTC
LWP is probably what you'll want at the core of your scraping. LWP::Simple and WWW::Mechanize are based on LWP and are both very powerful. Depending on what you want to Scrape there may be a module for it already.	[reply]
Re: How can I extract all the information on a webpage to my database? by Joost (Canon) on May 09, 2005 at 21:16 UTC
For a nice introduction to mysql database design and the perl bindings for mysql I found Managing & Using MySQL pretty good (I've only read the 1st edition) Also Class::DBI is a nice database - OO layer if you're into object-oriented design. "What should it profit a man, if he should win a flame war, yet lose his cool?"	[reply]
Re: How can I extract all the information on a webpage to my database? by devnul (Monk) on May 09, 2005 at 23:31 UTC
Just wanted to add an endorsement for WWW::Mechanize.... It is extremely flexible, IMHO. You may want to pare that with HTML::TreeBuild, HTML::TableExtract for even more fun. - dEvNuL	[reply]