in reply to Stripping HTML tags efficiently
I haven't benchmarked it myself, but I have used HTML::Strip to strip HTML from a document, and have found it to be effective and simple. The POD for the module claims that it is about five times faster than using regular expressions to strip HTML.
Here's how you do it:
use strict; use warnings; use LWP::Simple; use HTML::Strip; my $raw_html = get( 'http://www.somewebsite.com' ); my $hs = HTML::Strip->new(); my $clean_text = $hs->parse( $raw_html ); $hs->eof; print $clean_text, "\n";
Dave
|
|---|