Re^3: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?

Even though you entered a NNBSP on the command line, your program doesn't contain a NNBSP. (And I'm not referring to the appearance of  . That's due to a PerlMonks limitation.)

By default, Perl programs are expected to be encoded using ASCII. NNBSP isn't found in the ASCII character set, so your program can't possibly include a NNBSP.

Assuming a UTF-8 terminal, what you actually provided Perl is equivalent to "...\xE2\x80\xAF...". But a string containing a NNBSP would be "...\x{202F}...".

You can tell Perl that the program is encoded using UTF-8 by adding use utf8;.

Comment on Re^3: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? Select or Download Code

Replies are listed 'Best First'.
Re^4: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by nysus (Parson) on Aug 13, 2024 at 16:16 UTC
OK, thank you! I'm getting closer but still confused AF. So this returns files as desired: `use utf8; my $image_name = 'Screenshot-2024-02-23-at-1.05.14\xE2\x80\xAF'; my $files = $wac->get_all_files_in_dir($dir . '/uploads', qr/$image_na +me/);` [download] But this still does not match: `use utf8; my $image_name = 'Screenshot-2024-02-23-at-1.05.14\s'; my $files = $wac->get_all_files_in_dir($dir . '/uploads', qr/$image_na +me/);` [download] I'm using neovim. It shows file is also encoded as UTF-8. $PM = "Perl Monk's"; $MC = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ ~~Vicar~~ Parson"; $nysus = $PM . ' ' . $MC; Click here if you love Perl Monks	[reply] [d/l] [select]
Re^5: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by ikegami (Patriarch) on Aug 13, 2024 at 16:28 UTC
`use utf8;` has no effect in that program since it's encoded using ASCII. But it doesn't hurt since ASCII is a subset of UTF-8. The issue is that `get_all_files_in_dir` is matching against the still-encoded file names. The second program is effectively doing `my $fn = "Screenshot-2024-02-23-at-1.05.14\xE2\x80\xAF"; $fn =~ /Screenshot-2024-02-23-at-1.05.14\s/` [download] That will only match if U+E2 is a space character, and it isn't.	[reply] [d/l] [select]
Re^6: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by nysus (Parson) on Aug 13, 2024 at 16:39 UTC
Ok, before you give up on me. I put the file named `Screenshot-2024-02-23-at-1.05.14 AM.png` in directory (with the hidden space charachter) along with this script in the same dir: `#! /usr/bin/env perl use v5.36; use utf8; # get all the files in the current directory my @files = glob("*"); my ($file) = grep { /Screenshot-2024-02-23-at-1.05.14\s/ } @files; say $file;` [download] The above reports: `Use of uninitialized value $file in say at ./test.pl line 10.` If I change the regex to `/Screenshot-2024-02-23-at-1.05.14/` it works fine. I'm beginning to think Perl does not handle these chars in file names properly. But I'm clueless so that's a wild guess. EDIT: I should definitely mention I'm on macos which I heard doesn't have the best support for utf8</c> EDIT2: I tried this script on a linux docker container. Same result as on macOS $PM = "Perl Monk's"; $MC = "Most Clueless ~~Friar~~ ~~Abbot~~ ~~Bishop~~ ~~Pontiff~~ ~~Deacon~~ ~~Curate~~ ~~Priest~~ ~~Vicar~~ Parson"; $nysus = $PM . ' ' . $MC; Click here if you love Perl Monks	[reply] [d/l] [select]
Re^7: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by ikegami (Patriarch) on Aug 13, 2024 at 17:39 UTC
Re^7: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by nysus (Parson) on Aug 13, 2024 at 17:05 UTC
Re^8: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by Corion (Patriarch) on Aug 13, 2024 at 17:17 UTC
Some notes below your chosen depth have not been shown here
Re^8: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl? by nysus (Parson) on Aug 13, 2024 at 17:16 UTC
Some notes below your chosen depth have not been shown here