Hey Monks!
I have a string that contains an HTML file. What I'd like to do is first decode any HTML entities contained in the text (only!) and then encode the text with entities that I can specify. What I want returned is the entire string, in the same order that it was in, with just the text encoded with HTML entities.
I have assumed that using a combination of HTML::Parser and HTML::Entities is the best way to achieve my goal, but if you have a better way, then let me here it
Anyhow, anyone know how to do this? I don't have much experience with HTML::Parser, and the documentation is not really clear to me on how to do this.
Thanks
Update
I used the HTML::TokeParser::Simple module and HTML::Entities to get the solution:
use HTML::Entities; use HTML::TokeParser::Simple; my $html = <some file>; #this is shorthand for example..assume the Fil +e has been opened in slurp mode my $parsed = parseHTML($html); sub parseHTML { my $html = shift; my $parsed; my $p = HTML::TokeParser::Simple->new(\$html); while ( my $token = $p->get_token ) { # This prints all text in an HTML doc (i.e., it strips the HTM +L) if ($token->is_text) { my $text = $token->as_is; encode_entities($text, '",' ); $parsed .= $text; } else { $parsed .= $token->as_is; } } return $parsed; }
Thanks!
In reply to How to use HTML::Parser to encode text with HTML entities? by locust
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |