in reply to Stripping of HTML content
As Molt says, parsing HTML with regexes is very fragile and you'd be better off using a real HTML parser to do this.
Here's a simple example using HTML::Parser.
--use warnings; use strict; use HTML::Parser; my $html = do { local $/; <> }; my @text; my $p = HTML::Parser->new(text_h=> [\@text, 'dtext']); $p->parse($html); print map { $_->[0] } @text;
"The first rule of Perl club is you do not talk about
Perl club."
-- Chip Salzenberg
|
---|