This problem looks tailor-made for my HTML::TokeParser::Simple module, when combined with HTML::Tagset. The following test will demonstrate:
#!/usr/bin/perl -w use strict; use HTML::TokeParser::Simple; use HTML::Tagset; my $html = <<'END_HTML'; <a href="mylink">text1</a> <this is normal text> END_HTML my $p = HTML::TokeParser::Simple->new( \$html ); while ( my $token = $p->get_token ) { next if ! $token->is_text and exists $HTML::Tagset::isKnown{ $token->return_tag }; print $token->return_text; }
Result:
text1 <this is normal text>
Cheers,
Ovid
Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.
In reply to Re: Strip HTML tags again
by Ovid
in thread Strip HTML tags again
by dda
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |