in reply to Re: HTML::Entities - encode all non-alphanumeric and foreign chars?
in thread HTML::Entities - encode all non-alphanumeric and foreign chars?

Hadn't imagined it would take regex elements...

The first two

$encoded = encode_entities($input, '\W'); $encoded = encode_entities($input, '^\w');
wouldn't work for me. But I tried
$encoded = encode_entities($input, '\\W'); # note double backslash
and that did work, with one little picky issue - it was encoding every whiteepsace char as well whic, while not technically bothersome, is just not needed.

So I tried the last formulation witha space added to list - had to add it as a simple typed space - wouldn't accept a \s:

$encoded = encode_entities($input, '^a-zA-Z0-9_ ');
and that does it perfectly.

Thanks.

Replies are listed 'Best First'.
Re^3: HTML::Entities - encode all non-alphanumeric and foreign chars?
by Sidhekin (Priest) on Sep 23, 2007 at 20:15 UTC

    $encoded = encode_entities($input, '\\W'); # note double backslash

    Single backslash works for me. Sure you weren't trying with a double-quoted string?

    ('\w', '\\w', "\\w" should all be the same string, \w — whereas "\w" is just w.)

    Oh, and the same goes for \s. It should Just Work in a single-quoted string, but in a double-quoted string, you'll need to double the backslash.

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      Yes, absolutely right - double quotes. Replacing with single quotes makes '\W' work like a charm. Thanks.

      But, just to be finicky and difficult, '\W\s' is still converting spaces to &#32.

      UPDATE:

      Ya, of course it was. This is the list of UNSAFE characters to be encoded. So if I include '\W\s', that specificlaly tells it to encode spaces. What I want is '^\w\s' - anything that's not a word char or a space. Works perfect now.

      UPDATE 2

      OK, now this is very cool. With this formulation, I can create a very well defined list of what is and is not to be encoded. For example (what I'm using):

      $encoded = encode_entities($input'^\w\s.\-');
      encodes everything that is NOT a word char, or a space, or a period, or a dash (backslash needed to escape 'cause the dash is part of the module's syntax)