in reply to Re^6: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

> Could you please give an example for this? How do you "change the UTF8 flag"?

use v5.12.0; use warnings; #use Devel::Peek; use utf8; use Encode qw(is_utf8 _utf8_on _utf8_off); my $str = "ä"; say $str, ":",length($str); #Dump($str); _utf8_off($str); say $str, ":",length($str); #Dump($str);

ä:1 ä:2

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^8: Seeking Perl docs about how UTF8 flag propagates
by hv (Prior) on May 17, 2023 at 03:51 UTC

    Where the documentation for those functions say "INTERNAL", that should be taken as shorthand for "GO AWAY. THIS IS A REALLY ******* BAD IDEA. PUT DOWN THE UTF8 FLAG AND BACK AWAY."

    It is really quite depressing that this is expressed in shorthand.

    It is not a good idea to use these functions. It is not a good idea to suggest anyone else uses these functions. These functions should almost certainly not exist: there are vanishingly few people that are competent to use them safely, and to the best of my knowledge those people would in all cases know (and prefer) other ways to achieve the same effects.

    The functions in the utf8 module (upgrade, downgrade, encode, decode) are vastly safer, for example.

      Did you read the OP's question?
Re^8: Seeking Perl docs about how UTF8 flag propagates
by haj (Vicar) on May 17, 2023 at 06:46 UTC

    Thanks!

    Encode::_utf8_off is indeed a way to break things (use bytes; or XS code being other possibilities). All of them come with the appropriate warning signs. So, it's nothing to worry about.

      So you mean the OP didn't intend to ask if the flag is switched but if there are cases of automatic (undocumented) decoding or encoding?

      I would consider any such encoding a bug. I can't see a rational for degrading from text to binary, unless you use a bit oriented operator.

      Decoding from binary to text can happen, like if a transformation like lc or uc can't map otherwise.°

      But that's documented and can be configured too, AFAIR.

      And to be honest, treating a binary string as ASCII/Latin-1 is guesswork from Perl, it could be anything.

      Strictly speaking lc and uc shouldn't be allowed then, the programmer should have properly decoded it first.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

      °) or generally if only utf8-text characters are added.

        I didn't mean anything related to the OP's question. Following the discussion between hv and you I was concerned whether there might be a need to look at or manipulate the UTF8 flag in some circumstances. Luckily, I still don't see a reason to do this in my code.