in reply to Bypass utf-8 encoding/decoding?

What sort of (special ...) characters are present in the strings that you are handling now? Is the native encoding of those strings, as they are being handled within the Perl application, UTF-8 or something else?

Replies are listed 'Best First'.
Re^2: Bypass utf-8 encoding/decoding?
by chayyoo (Novice) on Nov 30, 2017 at 19:05 UTC

    They are mainly accented characters like à, é or ö. They are read from a utf-8 encoded file using "<:encoding(UTF-8)", which converts them (if I understand it right) to "internal Perl format", presumably also UTF-8. However if I don't perform the encode("utf8",$_) first, they arrive in my C function as Latin-1, not UTF-8. The result is also output to a file in UTF-8 using >:encoding(UTF-8), but if I don't perform the decode("utf8",..) on the string leaving my function, I get "double UTF8" encoded strings! So either:

    • Perl's internal string coding is Latin-1, not UTF-8 (at least for the characters I'm currently dealing with), or
    • Perl's internal string coding is UTF-8, but is converted to Latin-1 when passed to my C-function, and converted back from Latin-1 when reading the result back.

    Or is there another explanation? Anyway, regardless what characters I'm dealing with, is there a way to bypass all this recoding in an elegant and reliable way? Or is the Perl interpreter smart enough to do all the bypassing by itself, so that I incur no speed penalty?