Re: Re: A quick regex to (imperfectly) entity encode some HTML?

Replies are listed 'Best First'.
Re: Re: Re: A quick regex to (imperfectly) entity encode some HTML? by Ovid (Cardinal) on Mar 08, 2003 at 01:42 UTC
I'm not sure why that would be a problem. Can you decode those entities and then turn around and re-encode them? Further, your code would have the same issue. In your root post, you wrote: "if a defined symbol is not inside of < > then match". I'm not sure exactly what you mean. Do you mean that you don't want to encode anything that's already in a tag? The following untested snippet takes the name of an html document as its argument. use HTML::TokeParser::Simple; use HTML::Entities; use File::Copy; my $new_html = ''; my $orig_html = shift \|\| die "Usage: $0 some.html"; copy( $orig_html, "${orig_html}.bak") or die "Could not copy ($orig_html): $!"; my $parser = HTML::TokeParser::Simple->new($orig_html); while (my $token = $parser->get_token) { if ($token->is_tag) { $new_html .= $token->as_is; next; } $new_html .= encode_entities($token->as_is); } open OUTPUT "> $orig_html" or die "Cannot open ($orig_html) for writi +ng: $!"; print OUTPUT $new_html; close OUTPUT; [download] The above code is untested. Further, if you have `$HTML::Parser::VERSION < 3.25`, this will not parse XTHML correctly. Cheers, Ovid New address of my CGI Course. Silence is Evil (feel free to copy and distribute widely - note copyright text)	[reply] [d/l]
Re: Re: Re: Re: A quick regex to (imperfectly) entity encode some HTML? by Anonymous Monk on Mar 08, 2003 at 06:04 UTC
OK. Thanks for your help.	[reply]