in reply to Re^6: Seeking Perl docs about how UTF8 flag propagates
in thread Seeking Perl docs about how UTF8 flag propagates

> > Without UTF8-Flag it's a octet-stream and all string commands will treat every single byte (for backward compatibility) as (some) ASCII character

> Nitpick: Single bytes also work in the 128-255 range, so it is rather ISO-8859-1 than ASCII. For example, an (ä) encoded as chr 0xE4 matches qr/\w/, according to its unicode property.

I can't reproduce this, from what I see is \w defaulting to pure ASCII

C:\Users\rolflangsdorf>perl $a=chr(0xE4); print "$a matched \\w" if $a =~/^\w/; __END__ C:\Users\rolflangsdorf>perl -v This is perl 5, version 32, subversion 1 (v5.32.1) built for MSWin32-x +64-multi-thread

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

Replies are listed 'Best First'.
Re^8: Seeking Perl docs about how UTF8 flag propagates
by haj (Vicar) on May 22, 2023 at 15:45 UTC

    As has been said in this thread (and elsewhere): To reproduce, use 5.012; or newer, or more explicitly: use feature 'unicode_strings';

    I have the habit to always specify a minimum version I run in my programs, including demos for PerlMonks. I admit that it didn't occur to me that without a version declaration (or with a declaration of 5.010 or older) Perl behaves differently.

      well you were the one who started "nitpicking"! :)

      edit

      I'm wondering why it got such a confusing name

      from the docs

      • The 'unicode_strings' feature

        use feature 'unicode_strings' tells the compiler to use Unicode rules in all string operations executed within its scope

      calling it "unicode_rules" would make more sense to me.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery