in reply to Re^7: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

Thanks!

Encode::_utf8_off is indeed a way to break things (use bytes; or XS code being other possibilities). All of them come with the appropriate warning signs. So, it's nothing to worry about.

  • Comment on Re^8: Seeking Perl docs about how UTF8 flag propagates

Replies are listed 'Best First'.
Re^9: Seeking Perl docs about how UTF8 flag propagates
by LanX (Saint) on May 17, 2023 at 10:21 UTC
    So you mean the OP didn't intend to ask if the flag is switched but if there are cases of automatic (undocumented) decoding or encoding?

    I would consider any such encoding a bug. I can't see a rational for degrading from text to binary, unless you use a bit oriented operator.

    Decoding from binary to text can happen, like if a transformation like lc or uc can't map otherwise.°

    But that's documented and can be configured too, AFAIR.

    And to be honest, treating a binary string as ASCII/Latin-1 is guesswork from Perl, it could be anything.

    Strictly speaking lc and uc shouldn't be allowed then, the programmer should have properly decoded it first.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

    °) or generally if only utf8-text characters are added.

      I didn't mean anything related to the OP's question. Following the discussion between hv and you I was concerned whether there might be a need to look at or manipulate the UTF8 flag in some circumstances. Luckily, I still don't see a reason to do this in my code.

        There are a few legitimate reasons to explicitly change the utf8 flag (external service returning utf8 as binary is an example), but they are very rare.

        ---
        $world=~s/war/peace/g