in reply to HTML::Entities not encoding @ or .

...not escaped by default...

...not on the list...

Hmmm - so why is that when I do:

foreach $val (keys %char2entity) { print "<br>$val => $char2entity{$val}\n"; }
This outputs:

... <br>@ => &#64; ... <br>. => &#46;
Is %char2entity not the list of characters that are encoded by default?


Update:

Answer to my own question - no, I don't think %char2entity is the list of characters to encode by default.

If I take Anonymous Monk's suggestion:

encode($string, \000-\377)
then every single character in my string gets encoded - without supplying any further information about what are the codes for these characters. In other words, it looks like %char2eneitty is just the list of all char-to-entity relations for reference, NOT the list of chars that shoud be encoded by default.

Replies are listed 'Best First'.
Re^2: HTML::Entities not encoding @ or .
by Anonymous Monk on Feb 12, 2008 at 16:19 UTC
    Where do you get your information from?
      <humour>A guy in the back alley. I give him the password "monk" and $20 and he spills the beans....</humour>

      No - um, nowhere except in the documentation and the trials described above. The documentation says

      The module can also export the %char2entity and the %entity2char hashes, which contain the mapping from all characters to the corresponding entities (and vice versa, respectively).

      Which I took to mean "all characters that will be encoded by default". Then observed that encode_entities('@') does not encode @. So I wondered if that was because @ was not in the %char2entity hash, working on the assumption that %char2entity is the list of chars to encode by default. Using help from this board, I exported %char2entity and printed it out

      use HTML::Entities; use HTML::Entities qw( %char2entity %entity2char ); #thanks ikegami foreach $val (keys %char2entity) { print "<br>$val => $char2entity{$val}\n"; }
      and found that @ IS in the %char2entity hash. Then trying your suggestion (assuming this is the same Anonymous Monk) of
      encode_entities($a, "\000-\377");
      found that simply telling the module which characters to encode results in them being encoded, even though that command does not supply any new information about code-character mapping. The module, therefore, must already have that information, and it occurs to me that maybe that's the reason not all chars are encoded by default even though %char2entity contains a full set of char-entity relations - becase %char2entity is just a reference hash, NOT the list of chars to be encoded by deafult.
        Thats weird thing to do, considering the documentation says The default set of characters to encode are control chars, high-bit chars, and the <, &, >, and " characters..
        Reading the source would also be better
        } else { # Encode control chars, high bit chars and '<', '&', '>', ''' and +'"' $$ref =~ s/([^\n\r\t !\#\$%\(-;=?-~])/$char2entity{$1} || num_enti +ty($1)/ge; }