new_monk has asked for the wisdom of the Perl Monks concerning the following question:

Oh Great Monks, Novice here, I am trying to grab text from a website and use the data in my script. What would I need to review in order to learn how to do this? LWP? nm

Replies are listed 'Best First'.
Re: Text from Website
by friedo (Prior) on Aug 03, 2004 at 14:35 UTC
Re: Text from Website
by Fletch (Bishop) on Aug 03, 2004 at 14:41 UTC

    Yes LWP; specifically read perldoc lwptut and perldoc lwpcook. And the excelent Perl and LWP (ISBN 0596001789) for more on "screen scraping" HTML.

Re: Text from Website
by tachyon (Chancellor) on Aug 03, 2004 at 14:44 UTC

    See Re: HTML::Strip Problem for some sample code that probably does exactly what you want (get the page with LWP and strip the text with HTML::Parser). It also notes a few of the issues with screen scraping. Redirects, metarefreshes, frames, javascript, can all work against you. Add spider unfreindly sites to the list and you will soon want LWP::UserAgent and some wrapper code to get the data you probably want.

    cheers

    tachyon

Re: Text from Website
by nite_man (Deacon) on Aug 03, 2004 at 14:46 UTC

    In additional, try to see HTML::TokeParser.
    This is a very good module to parse HTML page and grub information according to page structure.

    ---
    Schiller

    It's only my opinion and it doesn't have pretensions of absoluteness!

Re: Text from Website
by Popcorn Dave (Abbot) on Aug 03, 2004 at 23:59 UTC
    As other monks have pointed out LWP::Simple and HTML::TokeParser should do exactly what you want.

    You will want to use your browser to grab the original source of the page you're trying to parse so that you don't hammer the server while you're testing your code, however.

    There is no emoticon for what I'm feeling now.