in reply to Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?

SSCCE please... \s works for me. Are you sure you're not using the /a modifier or something like that?

perl -wMstrict -lE "qq/\N{NARROW NO-BREAK SPACE}/ =~ /\A\s\z/ and say +'OK' or die" OK
  • Comment on Re: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
by nysus (Parson) on Aug 13, 2024 at 15:45 UTC

    Hmmm. Thanks. Maybe it's not the character I think it is:

    > $ perl -wMstrict -lE "qq/Screenshot-2024-02-23-at-1.05.14 AM.p +ng/ =~ /\s/ and say 'OK' or die" Died at -e line 1.

    The #8239 popped in after submitting this post. It's not actually in the code.

    #8239 is 202F in hex. I don't get this.

    This doesn't even work:

    > $ perl -wMstrict -lE "qq/Screenshot-2024-02-23-at-1.05.14 AM.p +ng/ =~ /\x{202F}/ and say 'OK' or die" + + + Died at -e line 1.

    $PM = "Perl Monk's";
    $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
    $nysus = $PM . ' ' . $MC;
    Click here if you love Perl Monks

      Even though you entered a NNBSP on the command line, your program doesn't contain a NNBSP. (And I'm not referring to the appearance of  . That's due to a PerlMonks limitation.)

      By default, Perl programs are expected to be encoded using ASCII. NNBSP isn't found in the ASCII character set, so your program can't possibly include a NNBSP.

      Assuming a UTF-8 terminal, what you actually provided Perl is equivalent to "...\xE2\x80\xAF...". But a string containing a NNBSP would be "...\x{202F}...".

      You can tell Perl that the program is encoded using UTF-8 by adding use utf8;.

        OK, thank you! I'm getting closer but still confused AF. So this returns files as desired:

        use utf8; my $image_name = 'Screenshot-2024-02-23-at-1.05.14\xE2\x80\xAF'; my $files = $wac->get_all_files_in_dir($dir . '/uploads', qr/$image_na +me/);
        But this still does not match:
        use utf8; my $image_name = 'Screenshot-2024-02-23-at-1.05.14\s'; my $files = $wac->get_all_files_in_dir($dir . '/uploads', qr/$image_na +me/);

        I'm using neovim. It shows file is also encoded as UTF-8.

        $PM = "Perl Monk's";
        $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
        $nysus = $PM . ' ' . $MC;
        Click here if you love Perl Monks

      The #8239 popped in after submitting this post. It's not actually in the code.

      Yeah, PerlMonks does that to Unicode characters in <code> blocks - see my node here.

      In that case, I would suspect an encoding error - see ikegami's reply and my node here.

        I copy and pasted the file name into a file and did a hex dump:

        00 01 02 03 04 05 06 07 - 08 09 0A 0B 0C 0D 0E 0F 012345678 +9ABCDEF 00000000 53 63 72 65 65 6E 73 68 - 6F 74 2D 32 30 32 34 2D Screensho +t-2024- 00000010 30 32 2D 32 33 2D 61 74 - 2D 31 2E 30 35 2E 31 34 02-23-at- +1.05.14 00000020 E2 80 AF 41 4D 2D 31 30 - 32 34 78 36 39 38 2E 70 ...AM-102 +4x698.p 00000030 6E 67 0A ng.

        E2 80 AF

        is the UTF8. I wonder if the acting of cutting and pasting is modifying the string. I'm using tmux.

        $PM = "Perl Monk's";
        $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
        $nysus = $PM . ' ' . $MC;
        Click here if you love Perl Monks