in reply to Re: Behaviour of Encode::decode_utf8 on ASCII
in thread Behaviour of Encode::decode_utf8 on ASCII
and the difference does matter from a performance point of view.CAVEAT: When you run $string = decode("utf8", $octets), then $string m +ay not be equal to $octets. Though they both contain the same data, t +he utf8 flag for $string is on unless $octets entirely consists of AS +CII data (or EBCDIC on EBCDIC machines). See "The UTF-8 flag" below.
UTF-8 tagged values in perl are contagious - concatenation with an untagged value will result in a tagged value (all well and good). But the regex engine on unicode strings is slower than on byte strings.
Basically with this change in behaviour you can lose performance in a utf8-aware-and-correct application which has the vast majority of its inputs in ASCII, since the previously uncommon case of handling unicode strings is now the 100% case.
This isn't theoretical, I'm fighting a significant CPU cost increase, which adds up over many servers.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Behaviour of Encode::decode_utf8 on ASCII
by graff (Chancellor) on Feb 15, 2007 at 06:11 UTC | |
by jbert (Priest) on Feb 15, 2007 at 08:13 UTC | |
by graff (Chancellor) on Feb 15, 2007 at 09:37 UTC | |
by jbert (Priest) on Feb 15, 2007 at 12:57 UTC | |
by fenLisesi (Priest) on Feb 15, 2007 at 09:47 UTC |