in reply to Parsing HTML
As far as removing all HTML, I happen to like HTML::Strip. It's easy to use, and results in pretty readable output. It has a habbit of indenting a lot, but that's easy to strip out too if you want. Here's the synopsis from its POD:
use HTML::Strip; my $hs = HTML::Strip->new(); my $clean_text = $hs->parse( $raw_html ); $hs->eof;
$clean_text now will contain the HTML-free version of $raw_html. It's as easy to use as LWP::Simple.
Dave
|
|---|