in reply to Re^2: Handling HTML special characters correctly
in thread Handling HTML special characters correctly

..although "\x{A3}" in UTF-8 would be encoded as two bytes ("\x{C2}\x{\xA3}"). See UTF-8 encoding table.

Update: removed superfluous parenthesis.

Replies are listed 'Best First'.
Re^4: Handling HTML special characters correctly
by cosmicperl (Chaplain) on Jul 03, 2008 at 00:25 UTC
    £ never used to cause me a problem on the old RH9. But these days most web servers seem to be set to en_us.UTF-8, where outputting £ will give you a nasty ? in the browser, needs to be £ these days.

    On a side note just noticed something annoying about HTML::Entities, if your input is already encoded, such as £, you'll get £, thought it would have checked for encoded characters and skipped them?


    Lyle
      On a side note just noticed something annoying about HTML::Entities, if your input is already encoded, such as £, you'll get £,
      well, how would you encode it? given an input like
      If you write & in HTML, it turns out as &
      The expected output of such a text after encoding would be:
      If you write & in HTML, it turns out as &
      Now you're saying, only the last ampersand should be escaped? Because the first one is already escaped? No, you never know if a text is already escaped.

      thought it would have checked for encoded characters and skipped them?

      That would be BAD! Decoding and encoding your post would change "such as £" to "such as £".

      On a side note just noticed something annoying about HTML::Entities, if your input is already encoded

      Duh?

      The same thing happens if you try to encode characters twice.
      The same thing happens if you try to encode URL characters twice.
      The same thing happens if you try to zip a string twice.
      etc