in reply to Printing undecoded utf8 -- safe?
Yes the output will be garbled, as perl thinks the contents of the string is ISO-Latin-1, and it will be "helpfully" converted to UTF-8 in the process.
You could just set the UTF-8 flag on the string, and leave the bytes as they are. One way is to use the private function _utf8_on() in Encode — well, it's not exactly private, but you're advised to use it very sparingly. Another way is to use pack this way:
$perl_utf8 = pack 'U0a*', $raw_utf8;
I'd recommend to check if the UTF8 is in a "consistent state" afterwards, with utf8::valid(), for example.
p.s. I just came across this function in the docs for utf8:
I haven't tried it, but it sounds like something you could use.
- utf8::decode($string)
- Attempts to convert in-place the octet sequence in UTF-X to the corresponding character sequence. The UTF-8 flag is turned on only if the source string contains multiple-byte UTF-X characters. If $string is invalid as UTF-X, returns false; otherwise returns true.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Printing undecoded utf8 -- safe?
by ryantate (Friar) on Mar 06, 2006 at 17:53 UTC |