in reply to utf8::upgrade weirdness
It so happens that the two byte utf8 value for "\xe9" (a.k.a. é) turns out to be 0xC3 0xA9 -- but don't confuse that with "\x{c3a9}", which represents a completely different unicode code point (U+C3A9, one of the CJK ideograph characters).
If you read enough of perlunicode to understand how utf8 works (look for the section titled "Unicode Encodings"), you can figure out why the 16-bit unicode code point U+00E9 (expressable in perl 5.8 as just "\xe9") turns out to be the two-byte binary sequence 0xC3 0xA9 when it's encoded as utf8 -- but hex-numeric literals in strings and regexes are supposed to express 16-bit code points. Note the following:
perl -e '$x="\xe9"; $y="\x{00e9}"; print "\\xe9 eq \\x00e9\n" if ($x e +q $y)' # output is: \xe9 eq \x00e9
update: To give a direct answer to your question:
Why is the latin e letter with acute not getting upgraded to UTF-8 ?Actually, the letter is being upgraded to utf8; you were just comparing it to the wrong literal value.
And in case you are trying to print the value '\xe9' to a file handle as utf8 data, you must first set the file handle to utf8 mode -- e.g.:
perl -e 'binmode STDOUT, ":utf8"; print "\xe9"' | xxd # output is: 0000000: c3a9 ..
|
|---|