and the difference does matter from a performance point of view.CAVEAT: When you run $string = decode("utf8", $octets), then $string m +ay not be equal to $octets. Though they both contain the same data, t +he utf8 flag for $string is on unless $octets entirely consists of AS +CII data (or EBCDIC on EBCDIC machines). See "The UTF-8 flag" below.
UTF-8 tagged values in perl are contagious - concatenation with an untagged value will result in a tagged value (all well and good). But the regex engine on unicode strings is slower than on byte strings.
Basically with this change in behaviour you can lose performance in a utf8-aware-and-correct application which has the vast majority of its inputs in ASCII, since the previously uncommon case of handling unicode strings is now the 100% case.
This isn't theoretical, I'm fighting a significant CPU cost increase, which adds up over many servers.
In reply to Re^2: Behaviour of Encode::decode_utf8 on ASCII
by jbert
in thread Behaviour of Encode::decode_utf8 on ASCII
by jbert
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |