in reply to Re: HTML::Entities not encoding @ or .
in thread HTML::Entities not encoding @ or .

Where do you get your information from?
  • Comment on Re^2: HTML::Entities not encoding @ or .

Replies are listed 'Best First'.
Re^3: HTML::Entities not encoding @ or .
by punch_card_don (Curate) on Feb 12, 2008 at 16:53 UTC
    <humour>A guy in the back alley. I give him the password "monk" and $20 and he spills the beans....</humour>

    No - um, nowhere except in the documentation and the trials described above. The documentation says

    The module can also export the %char2entity and the %entity2char hashes, which contain the mapping from all characters to the corresponding entities (and vice versa, respectively).

    Which I took to mean "all characters that will be encoded by default". Then observed that encode_entities('@') does not encode @. So I wondered if that was because @ was not in the %char2entity hash, working on the assumption that %char2entity is the list of chars to encode by default. Using help from this board, I exported %char2entity and printed it out

    use HTML::Entities; use HTML::Entities qw( %char2entity %entity2char ); #thanks ikegami foreach $val (keys %char2entity) { print "<br>$val => $char2entity{$val}\n"; }
    and found that @ IS in the %char2entity hash. Then trying your suggestion (assuming this is the same Anonymous Monk) of
    encode_entities($a, "\000-\377");
    found that simply telling the module which characters to encode results in them being encoded, even though that command does not supply any new information about code-character mapping. The module, therefore, must already have that information, and it occurs to me that maybe that's the reason not all chars are encoded by default even though %char2entity contains a full set of char-entity relations - becase %char2entity is just a reference hash, NOT the list of chars to be encoded by deafult.
      Thats weird thing to do, considering the documentation says The default set of characters to encode are control chars, high-bit chars, and the <, &, >, and " characters..
      Reading the source would also be better
      } else { # Encode control chars, high bit chars and '<', '&', '>', ''' and +'"' $$ref =~ s/([^\n\r\t !\#\$%\(-;=?-~])/$char2entity{$1} || num_enti +ty($1)/ge; }