in reply to UTF8/Unicode Confusion

Here's my guess: Byte-and-Character-Semantics

"However, as an interim compatibility measure, Perl aims to provide a safe migration path from byte semantics to character semantics for programs. For operations where Perl can unambiguously decide that the input data are characters, Perl switches to character semantics. For operations where this determination cannot be made without additional information from the user, Perl decides in favor of compatibility and chooses to use byte semantics."

So it's can't guess well in 5.8 with \x{}, su I have to give it the hint.