in reply to Re^5: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

Without UTF8-Flag it's a octet-stream and all string commands will treat every single byte (for backward compatibility) as (some) ASCII character
Nitpick: Single bytes also work in the 128-255 range, so it is rather ISO-8859-1 than ASCII. For example, an (ä) encoded as chr 0xE4 matches qr/\w/, according to its unicode property.
  • Comment on Re^6: Seeking Perl docs about how UTF8 flag propagates

Replies are listed 'Best First'.
Re^7: Seeking Perl docs about how UTF8 flag propagates
by LanX (Saint) on May 22, 2023 at 14:18 UTC
    > > Without UTF8-Flag it's a octet-stream and all string commands will treat every single byte (for backward compatibility) as (some) ASCII character

    > Nitpick: Single bytes also work in the 128-255 range, so it is rather ISO-8859-1 than ASCII. For example, an (ä) encoded as chr 0xE4 matches qr/\w/, according to its unicode property.

    I can't reproduce this, from what I see is \w defaulting to pure ASCII

    C:\Users\rolflangsdorf>perl $a=chr(0xE4); print "$a matched \\w" if $a =~/^\w/; __END__ C:\Users\rolflangsdorf>perl -v This is perl 5, version 32, subversion 1 (v5.32.1) built for MSWin32-x +64-multi-thread

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      As has been said in this thread (and elsewhere): To reproduce, use 5.012; or newer, or more explicitly: use feature 'unicode_strings';

      I have the habit to always specify a minimum version I run in my programs, including demos for PerlMonks. I admit that it didn't occur to me that without a version declaration (or with a declaration of 5.010 or older) Perl behaves differently.

        well you were the one who started "nitpicking"! :)

        edit

        I'm wondering why it got such a confusing name

        from the docs

        • The 'unicode_strings' feature

          use feature 'unicode_strings' tells the compiler to use Unicode rules in all string operations executed within its scope

        calling it "unicode_rules" would make more sense to me.

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery

Re^7: Seeking Perl docs about how UTF8 flag propagates
by LanX (Saint) on May 16, 2023 at 11:34 UTC
    yeah I was too lazy to look it up so I said some ASCII... (which is probably still technically incorrect)

    edit

    doesn't it depend on the locale settings?

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery