in reply to utf8::upgrade weirdness

Note that utf8::valid is an internal method, and shouldn't be needed or useful in production code.

\x{c3a9} is not a valid unicode codepoint; I think you meant \xc3\xa9. But even that won't match, because perl still treats the string as a sequence of characters, the third of which is the unicode code point 00E9. If you want to create string where each character is a byte of a utf8-encoded string, you want to be using Encode, not the utf8 functions:

$string = encode("utf8", $string);
This should do exactly the same thing whether you've done utf8::upgrade($string) or not.

Replies are listed 'Best First'.
Re^2: utf8::upgrade weirdness
by graff (Chancellor) on Aug 09, 2006 at 03:24 UTC
    Actually, "\x{c3a9}" is a valid code point. You can look it up.
      Oops, I just looked in perl's unicore/UnicodeData.txt for an exact match, but that only has
      AC00;<Hangul Syllable, First>;Lo;0;L;;;;;N;;;;; D7A3;<Hangul Syllable, Last>;Lo;0;L;;;;;N;;;;;
      Thanks for the correction.