in reply to Re^6: XML:: DOM and Accented Characters
in thread XML:: DOM and Accented Characters

I'm not sure what you mean, all the encoding tables I have looked at show the e-acute as hex C3 A9 under UTF8?

Sure they do, its perl that must be broken :)

http://www.fileformat.info/info/unicode/char/c3a9/index.htm
Encodings
HTML Entity (decimal) 쎩
HTML Entity (hex) 쎩
How to type in Microsoft Windows Alt +C3A9
UTF-8 (hex) 0xEC 0x8E 0xA9 (ec8ea9)
UTF-8 (binary) 11101100:10001110:10101001
UTF-16 (hex) 0xC3A9 (c3a9)
UTF-16 (decimal) 50,089
UTF-32 (hex) 0x0000C3A9 (c3a9)
UTF-32 (decimal) 50,089
C/C++/Java source code "\uC3A9"
Python source code u"\uC3A9"
More...
  • Comment on Re^7: XML:: DOM and Accented Characters

Replies are listed 'Best First'.
Re^8: XML:: DOM and Accented Characters
by freeflyer (Novice) on Aug 07, 2010 at 12:55 UTC
      I see now I was using wrong input to perl string
      $ perl -e" binmode STDOUT,q!:encoding(UTF-8)!; print qq!\N{U+00E9}!" | +od -tx1 0000000 c3 a9 0000002 $ perl -e" binmode STDOUT,q!:encoding(UTF-8)!; print qq!\xE9!" |od -tx +1 0000000 c3 a9 0000002 $ perl -e" binmode STDOUT,q!:encoding(UTF-8)!; print qq!\x{00E9}!" |od + -tx1 0000000 c3 a9 0000002 $ perl -e" binmode STDOUT,q!:encoding(UTF-8)!; print qq!\x{C3A9}!" |od + -tx1 0000000 ec 8e a9 0000003
      You could have avoided my nonsense if you provided code in in Re^4: XML:: DOM and Accented Characters, thanks :)