I'm not sure why that would be a problem. Can you decode those entities and then turn around and re-encode them? Further, your code would have the same issue.
In your root post, you wrote: "if a defined symbol is not inside of < > then match". I'm not sure exactly what you mean. Do you mean that you don't want to encode anything that's already in a tag? The following untested snippet takes the name of an html document as its argument.
use HTML::TokeParser::Simple;
use HTML::Entities;
use File::Copy;
my $new_html = '';
my $orig_html = shift || die "Usage: $0 some.html";
copy( $orig_html, "${orig_html}.bak")
or die "Could not copy ($orig_html): $!";
my $parser = HTML::TokeParser::Simple->new($orig_html);
while (my $token = $parser->get_token) {
if ($token->is_tag) {
$new_html .= $token->as_is;
next;
}
$new_html .= encode_entities($token->as_is);
}
open OUTPUT "> $orig_html" or die "Cannot open ($orig_html) for writi
+ng: $!";
print OUTPUT $new_html;
close OUTPUT;
The above code is untested. Further, if you have $HTML::Parser::VERSION < 3.25, this will not parse XTHML correctly.
Cheers,
Ovid
New address of my CGI Course.
Silence is Evil (feel free to copy and distribute widely - note copyright text) |