in reply to HTML::TokeParser not stripping entities and xhtml
Even though HTML::TokeParser is great, you can alleviate some pain by switching to HTML::TokeParser::Simple. If you just want to strip out the malformed tags, here's a first try:
#!/usr/bin/perl use strict; use warnings; use HTML::TokeParser::Simple; my $text = '</tr /> </tbody /> </table /></p> <p>We have different groups to help you through the buying proces +s. Our team of counselors and volunteers can provide transportat +ion, and childcare. </p> <p> </p>'; my $result = ''; my $p = HTML::TokeParser::Simple->new(\$text); while ( my $token = $p->get_token ) { my $text = $token->as_is; if ($token->is_tag) { next if $text =~ /^<\/.*\/>$/; } $result .= $text; } print $result;
Cheers,
Ovid
New address of my CGI Course.
|
|---|