in reply to Re^9: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
in thread Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
And we're back to the original question (answered here): You're missing use utf8;.
Also, you should be using sprintf "%vX", $_ instead of unpack "H*", $_. The former handles any strings. The latter only handles strings of bytes (strings where the characters are no higher than 0xFF), so it's definitely inappropriate here.
#!/usr/bin/perl
use v5.36;
use warnings;
# Source code encoded using UTF-8.
use utf8;
# Terminal provides/expects UTF-8 (for `say`).
use open ":std", ":encoding(UTF-8)";
use Encode qw( decode_utf8 );
my $base = "Screenshot-2024-02-23-at-1.05.14\x{202F}AM.png";
my $lit = "Screenshot-2024-02-23-at-1.05.14 AM.png";
my @files = map { decode_utf8 $_ } glob( "*" );
my ( $file ) = grep { /^Screenshot-2024-02-23-at-1.05.14\s/ } @files;
my $base_hex = sprintf "%vX", $hex;
for ( $base, $lit, $file ) {
say $_;
say $_ eq $base ? "same" : "different";
my $hex = sprintf "%vX", $_;
say $hex;
say $hex eq $base_hex ? "same" : "different";
}
Output:
Screenshot-2024-02-23-at-1.05.14 AM.png same 53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.74.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67 same Screenshot-2024-02-23-at-1.05.14 AM.png same 53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.74.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67 same Screenshot-2024-02-23-at-1.05.14 AM.png same 53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.74.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67 same
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^11: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
by nysus (Parson) on Aug 13, 2024 at 18:06 UTC | |
|
Re^11: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
by nysus (Parson) on Aug 13, 2024 at 17:55 UTC | |
by ikegami (Patriarch) on Aug 13, 2024 at 18:02 UTC | |
by nysus (Parson) on Aug 13, 2024 at 18:23 UTC |