Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How do I get HTML::Entities encode_entities to also encode  [ ] for perlmonks

Replies are listed 'Best First'.
Re: HTML::Entities encode_entities for perlmonks [ ]
by kcott (Archbishop) on Jun 15, 2013 at 09:35 UTC

    Using the HTML::Entities documentation:

    #!/usr/bin/env perl -l use strict; use warnings; use utf8; binmode(STDOUT => ':utf8'); use HTML::Entities; my $unsafe_chars = "<&>'\"[]\200-\377"; my $string = "<[Here's my \"2¢\" worth]>"; print $string; print encode_entities($string, $unsafe_chars);

    Output:

    $ pm_html_ent_plus_brackets.pl <[Here's my "2¢" worth]> &lt;&#91;Here&#39;s my &quot;2&cent;&quot; worth&#93;&gt;

    Update: Oops! just noticed &Acirc; in the output (just before &cent;). Fixed by adding:

    use utf8; binmode(STDOUT => ':utf8');
    • use utf8; — because source code contains UTF-8, i.e. the ¢ character.
    • binmode(STDOUT => ':utf8'); — doesn't change any of the character entity references but, without it, print $string; gives <[Here's my "2?" worth]> (note the ? instead of ¢) now that use utf8; has been added.

    -- Ken

      Thanks, that looks almost complete, I think cntrl is missing I tried using [:cntrl:] but that didn't work -- OTOH using  my $unsafe_chars = q{\x00-\x1f<&>'"[]\200-\377}; worked

        Fair comment. I hadn't previously used this module so, as indicated, I was working from the doco: I missed the "control chars" part of:

        The default set of characters to encode are control chars, high-bit chars, and the <, &, >, ' and " characters.

        Also, note the update regarding the utf8 pragma.

        -- Ken

Re: HTML::Entities encode_entities for perlmonks [ ]
by space_monk (Chaplain) on Jun 15, 2013 at 08:49 UTC

    Update: Do two runs of encode entities to encode normal characters first time and your special characters the second time. This may be a little slower than doing one pass but means you don't have to specify exactly what else you need to encode.

    my $string = '[I haz square brackets]'; my $unsafe_chars = '[]'; # first pass encodes default set my $pass1= encode_entities( $string ); # second pass encodes speshul chars my $encoded = encode_entities( $pass1, $unsafe_chars );
    If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)

      Nice try

      #!/usr/bin/perl -- use strict; use warnings; use HTML::Entities ; my $string = '[< smarto >]'; my $unsafe_chars = '[]'; my $encoded = encode_entities( $string, $unsafe_chars ); print "$string\n$encoded\n"; __END__ [< smarto >] &#91;< smarto >&#93;

      The question is what to add encode_entities to also encode  [ ] -- I want what the default encode_entities does plus  [ ]

        Next time be a bit more precise with your question! :-)

        You could simply run a first pass of encode_entities with no params so it would encode the default values, and the second pass with any special characters you also need encoding. I've amended my original answer accordingly.

        If you spot any bugs in my solutions, it's because I've deliberately left them in as an exercise for the reader! :-)
        Why not just encode the other characters manually?
        use HTML::Entities ; my $string = '[< smarto >]'; my $unsafe_chars = '[]'; my $encoded = encode_entities($string) =~ s{([\Q$unsafe_chars\E])}{sprintf"&#%d;",ord$1}egr; print "$string\n$encoded\n";