in reply to Re^2: How to set the UTF8 flag?
in thread How to set the UTF8 flag?

Many—including the OP, apparently—assume it indicates whether the characters[1] of the string are Code Points or bytes. It does not.

It's a bit that indicates the internal storage format of the string.

Being internal, you have no reason to access it unless debugging an XS module (which must deal with the two formats) or Perl itself. In such cases, you can use aforementioned utf8::is_utf8 or Devel::Peek's Dump. C code has access to the similar SvUTF8 and sv_dump.


  1. I define character as an element of a string as returned by substr( $_, $i, 1 ) or ord( substr( $_, $i, 1 ) ), whatever the value means.

Replies are listed 'Best First'.
Re^4: How to set the UTF8 flag?
by harangzsolt33 (Deacon) on Aug 19, 2025 at 19:35 UTC
    Oh okay. I don't understand what you mean by utf8 vs UTF-8. Is there a difference?

    Also, I'm not sure why I got two thumbs down on my question. Is it not allowed to ask questions in here anymore? There must have been some rule changes since 2016 when I first got on this forum.

      utf8 is a Perl-specific extension of UTF-8 capable of encoding any 72-bit value (but it's limited to encoding values the size of UVs in practice).

      I didn't downvote, but it's probably because you completely fabricated a definition of the flag.

        The mystery deepens. You said 72-bit value? Why? There only 1,114,112 code points. That could be stored in 3 bytes. What kind of 72-bit values are you referring to?