in reply to HTML::TokeParser not stripping entities and xhtml

Even though HTML::TokeParser is great, you can alleviate some pain by switching to HTML::TokeParser::Simple. If you just want to strip out the malformed tags, here's a first try:

#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $text = '</tr /> </tbody /> </table /></p> <p>We have&nbsp;different groups to help you through the buying proces +s.&nbsp;Our team of counselors and volunteers can provide transportat +ion, and childcare.&nbsp;</p> <p>&nbsp;</p>'; my $result = ''; my $p = HTML::TokeParser::Simple->new(\$text); while ( my $token = $p->get_token ) { my $text = $token->as_is; if ($token->is_tag) { next if $text =~ /^<\/.*\/>$/; } $result .= $text; } print $result;

Cheers,
Ovid

New address of my CGI Course.