in reply to A quick regex to (imperfectly) entity encode some HTML?

The easy way using HTML::Entities:

use HTML::Entities; encode_entities( $some_var ); # or decode_entities( $some_var );

See the documentation for extra features.

Cheers,
Ovid

New address of my CGI Course.
Silence is Evil (feel free to copy and distribute widely - note copyright text)

Replies are listed 'Best First'.
Re: Re: A quick regex to (imperfectly) entity encode some HTML?
by Anonymous Monk on Mar 08, 2003 at 00:49 UTC
    Thanks for the quick reply. This would be the easy way if the text didn't already have (X)HTML tags in it. I'm looking for "wisdom" to working around existing tags efficiently.

      I'm not sure why that would be a problem. Can you decode those entities and then turn around and re-encode them? Further, your code would have the same issue.

      In your root post, you wrote: "if a defined symbol is not inside of < > then match". I'm not sure exactly what you mean. Do you mean that you don't want to encode anything that's already in a tag? The following untested snippet takes the name of an html document as its argument.

      use HTML::TokeParser::Simple; use HTML::Entities; use File::Copy; my $new_html = ''; my $orig_html = shift || die "Usage: $0 some.html"; copy( $orig_html, "${orig_html}.bak") or die "Could not copy ($orig_html): $!"; my $parser = HTML::TokeParser::Simple->new($orig_html); while (my $token = $parser->get_token) { if ($token->is_tag) { $new_html .= $token->as_is; next; } $new_html .= encode_entities($token->as_is); } open OUTPUT "> $orig_html" or die "Cannot open ($orig_html) for writi +ng: $!"; print OUTPUT $new_html; close OUTPUT;

      The above code is untested. Further, if you have $HTML::Parser::VERSION < 3.25, this will not parse XTHML correctly.

      Cheers,
      Ovid

      New address of my CGI Course.
      Silence is Evil (feel free to copy and distribute widely - note copyright text)

        OK. Thanks for your help.