in reply to Re^2: Matching HTML Tags
in thread Matching HTML Tags

I had to do the same thing a couple of days ago. Here is my HTML::Parser solution (to complement tilly's HTML::TokeParser version above). It encodes any special characters found in the text portion of an HTML doc.

use HTML::Parser; use HTML::Entities; my $html = '<div align="center">Your "HTML" page goes here</div>'; my $enc = ''; my $p = HTML::Parser->new( unbroken_text => 1, default_h => [ sub { $enc .= join('', @_) }, "text" ], text_h => [ sub { $enc .= HTML::Entities::encode_entities($_[0]) }, +"text" ], ); $p->parse($html); print $encoded;

Handling JavaScript is left as an exercise for the reader ;)

- Cees