Text from Website

new_monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Text from Website by friedo (Prior) on Aug 03, 2004 at 14:35 UTC
LWP::Simple should work for your needs. If you need to do something more complicated than a simple GET, use LWP::UserAgent.	[reply]
Re: Text from Website by Fletch (Bishop) on Aug 03, 2004 at 14:41 UTC
Yes LWP; specifically read `perldoc lwptut` and `perldoc lwpcook`. And the excelent Perl and LWP (ISBN 0596001789) for more on "screen scraping" HTML.	[reply] [d/l] [select]
Re: Text from Website by tachyon (Chancellor) on Aug 03, 2004 at 14:44 UTC
See Re: HTML::Strip Problem for some sample code that probably does exactly what you want (get the page with LWP and strip the text with HTML::Parser). It also notes a few of the issues with screen scraping. Redirects, metarefreshes, frames, javascript, can all work against you. Add spider unfreindly sites to the list and you will soon want LWP::UserAgent and some wrapper code to get the data you probably want. cheers tachyon	[reply]
Re: Text from Website by nite_man (Deacon) on Aug 03, 2004 at 14:46 UTC
In additional, try to see HTML::TokeParser. This is a very good module to parse HTML page and grub information according to page structure. --- Schiller It's only my opinion and it doesn't have pretensions of absoluteness!	[reply]
Re: Text from Website by Popcorn Dave (Abbot) on Aug 03, 2004 at 23:59 UTC
As other monks have pointed out LWP::Simple and HTML::TokeParser should do exactly what you want. You will want to use your browser to grab the original source of the page you're trying to parse so that you don't hammer the server while you're testing your code, however. There is no emoticon for what I'm feeling now.	[reply]