I had to do the same thing a couple of days ago. Here is my HTML::Parser solution (to complement tilly's HTML::TokeParser version above). It encodes any special characters found in the text portion of an HTML doc.
use HTML::Parser;
use HTML::Entities;
my $html = '<div align="center">Your "HTML" page goes here</div>';
my $enc = '';
my $p = HTML::Parser->new(
unbroken_text => 1,
default_h => [ sub { $enc .= join('', @_) }, "text" ],
text_h => [ sub { $enc .= HTML::Entities::encode_entities($_[0]) },
+"text" ],
);
$p->parse($html);
print $encoded;
Handling JavaScript is left as an exercise for the reader ;)
- Cees |