in reply to Re^11: Seeking Perl docs about how UTF8 flag propagates (Terminology)
in thread Seeking Perl docs about how UTF8 flag propagates

That's wrong.

It doesn't "default to ASCII". It works against decoded text aka string of Unicode Code Points. Always. This can be demonstrated using "\N{U+100}" =~ /\w/ (which matches). You need to use /a if to limit it to the ASCII range.

Text encoded using ASCII happens to work because $x eq encode( "US-ASCII", $x ).

Text encoded using iso-latin-1 happens to work because $x eq encode( "iso-latin-1", $x ) (though do see last paragraph).

Those are just side effects of \w working on decoded text.

There was a bug where \w didn't work for characters in U+0080..U+00FF sometimes. This was fixed 12 years ago in 2011. Add use v5.14; to get the fix.

Replies are listed 'Best First'.
Re^13: Seeking Perl docs about how UTF8 flag propagates (Terminology)
by LanX (Saint) on May 22, 2023 at 20:50 UTC
    > There was a bug where \w didn't work for characters in U+0080..U+00FF sometimes. This was fixed 12 years ago in 2011. Add use v5.14; to get the fix.

    I call default anything without pragmas. But we can agree that the default is buggy.

    Cheers Rolf
    (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
    Wikisyntax for the Monastery

      It doesn't "default to ASCII". It works against decoded text aka string of Unicode Code Points. Always. Even without pragmas. This can be demonstrated using "\N{U+100}" =~ /\w/ (which matches). You need to use /a if to limit it to the ASCII range.

        we were talking about encoded text without UTF8 flag, but ...

        DB<3> p utf8::is_utf8("\N{U+100}") 1

        please lets stop it here.

        Cheers Rolf
        (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
        Wikisyntax for the Monastery