Yes the output will be garbled, as perl thinks the contents of the string is ISO-Latin-1, and it will be "helpfully" converted to UTF-8 in the process.
You could just set the UTF-8 flag on the string, and leave the bytes as they are. One way is to use the private function _utf8_on() in Encode — well, it's not exactly private, but you're advised to use it very sparingly. Another way is to use pack this way:
$perl_utf8 = pack 'U0a*', $raw_utf8;
I'd recommend to check if the UTF8 is in a "consistent state" afterwards, with utf8::valid(), for example.
p.s. I just came across this function in the docs for utf8:
I haven't tried it, but it sounds like something you could use.
- utf8::decode($string)
- Attempts to convert in-place the octet sequence in UTF-X to the corresponding character sequence. The UTF-8 flag is turned on only if the source string contains multiple-byte UTF-X characters. If $string is invalid as UTF-X, returns false; otherwise returns true.
In reply to Re: Printing undecoded utf8 -- safe?
by bart
in thread Printing undecoded utf8 -- safe?
by ryantate
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |