in reply to What is the convert rule of utf8::upgrade?
utf8 is a core module. Can you suggest how we could phrase the documentation for utf8::upgrade differently so it becomes more clear as to what it does and where its limitations lie?
$num_octets = utf8::upgrade($string)Converts in-place the internal representation of the string from an octet sequence in the native encoding (Latin-1 or EBCDIC) to UTF-X. The logical character sequence itself is unchanged. If $string is already stored as UTF-X, then this is a no-op. Returns the number of octets necessary to represent the string as UTF-X. Can be used to make sure that the UTF-8 flag is on, so that \w or lc() work as Unicode on strings containing characters in the range 0x80-0xFF (on ASCII and derivatives).
Note that this function does not handle arbitrary encodings. Therefore Encode is recommended for the general purposes; see also Encode.
|
|---|