| [reply] |
Yes LWP; specifically read perldoc lwptut and perldoc lwpcook. And the excelent Perl and LWP (ISBN 0596001789) for more on "screen scraping" HTML.
| [reply] [d/l] [select] |
See Re: HTML::Strip Problem for some sample code that probably does exactly what you want (get the page with LWP and strip the text with HTML::Parser). It also notes a few of the issues with screen scraping. Redirects, metarefreshes, frames, javascript, can all work against you. Add spider unfreindly sites to the list and you will soon want LWP::UserAgent and some wrapper code to get the data you probably want.
| [reply] |
In additional, try to see HTML::TokeParser. This is a very good module to parse HTML page and grub information according to page structure.
---
Schiller
It's only my opinion and it doesn't have pretensions of absoluteness!
| [reply] |
As other monks have pointed out LWP::Simple and HTML::TokeParser should do exactly what you want.
You will want to use your browser to grab the original source of the page you're trying to parse so that you don't hammer the server while you're testing your code, however.
There is no emoticon for what I'm feeling now.
| [reply] |