Re: Stripping of HTML content

As Molt says, parsing HTML with regexes is very fragile and you'd be better off using a real HTML parser to do this.

Here's a simple example using HTML::Parser.


use warnings;
use strict;
use HTML::Parser;

my $html = do { local $/; <> };

my @text;
my $p = HTML::Parser->new(text_h=> [\@text, 'dtext']);

$p->parse($html);

print map { $_->[0] } @text;
[download]

--
<http://www.dave.org.uk>

"The first rule of Perl club is you do not talk about Perl club."
-- Chip Salzenberg

Comment on Re: Stripping of HTML content Download Code