Even though HTML::TokeParser is great, you can alleviate some pain by switching to HTML::TokeParser::Simple. If you just want to strip out the malformed tags, here's a first try:
#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $text = '</tr /> </tbody /> </table /></p> <p>We have different groups to help you through the buying proces +s. Our team of counselors and volunteers can provide transportat +ion, and childcare. </p> <p> </p>'; my $result = ''; my $p = HTML::TokeParser::Simple->new(\$text); while ( my $token = $p->get_token ) { my $text = $token->as_is; if ($token->is_tag) { next if $text =~ /^<\/.*\/>$/; } $result .= $text; } print $result;
Cheers,
Ovid
New address of my CGI Course.
In reply to Re: HTML::TokeParser not stripping entities and xhtml
by Ovid
in thread HTML::TokeParser not stripping entities and xhtml
by bradcathey
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |