in reply to Re^4: XML:: DOM and Accented Characters
in thread XML:: DOM and Accented Characters

not the C3 A9 I'm looking for

Then you're not looking for UTF-8!!!!!

$ perl -e"print qq!\x{C3A9}! Wide character in print at -e line 1. 쎩 $ perl -Mopen=:std,:encoding(UTF-8) -e"print qq!\x{C3A9}!" |hexdump 00000000: EC 8E A9 - | | 00000003; $ perl -Mopen=:std,:encoding(UTF-16LE) -e"print qq!\x{C3A9}!" |hexdump 00000000: A9 C3 - | | 00000002; $ perl -Mopen=:std,:encoding(UTF-16BE) -e"print qq!\x{C3A9}!" |hexdump 00000000: C3 A9 - | | 00000002; $
UTF16-BE shows C3A9, and it is not UTF-8 as encoding="UTF-8"? claims

Replies are listed 'Best First'.
Re^6: XML:: DOM and Accented Characters
by freeflyer (Novice) on Aug 07, 2010 at 12:14 UTC

    I'm not sure what you mean, all the encoding tables I have looked at show the e-acute as hex C3 A9 under UTF8?

      I'm not sure what you mean, all the encoding tables I have looked at show the e-acute as hex C3 A9 under UTF8?

      Sure they do, its perl that must be broken :)

      http://www.fileformat.info/info/unicode/char/c3a9/index.htm
      Encodings
      HTML Entity (decimal) 쎩
      HTML Entity (hex) 쎩
      How to type in Microsoft Windows Alt +C3A9
      UTF-8 (hex) 0xEC 0x8E 0xA9 (ec8ea9)
      UTF-8 (binary) 11101100:10001110:10101001
      UTF-16 (hex) 0xC3A9 (c3a9)
      UTF-16 (decimal) 50,089
      UTF-32 (hex) 0x0000C3A9 (c3a9)
      UTF-32 (decimal) 50,089
      C/C++/Java source code "\uC3A9"
      Python source code u"\uC3A9"
      More...