in reply to Re^9: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
in thread Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?

And we're back to the original question (answered here): You're missing use utf8;.

Also, you should be using sprintf "%vX", $_ instead of unpack "H*", $_. The former handles any strings. The latter only handles strings of bytes (strings where the characters are no higher than 0xFF), so it's definitely inappropriate here.

#!/usr/bin/perl

use v5.36;
use warnings;

# Source code encoded using UTF-8.
use utf8;

# Terminal provides/expects UTF-8 (for `say`).
use open ":std", ":encoding(UTF-8)"; 

use Encode qw( decode_utf8 );

my $base = "Screenshot-2024-02-23-at-1.05.14\x{202F}AM.png";
my $lit  = "Screenshot-2024-02-23-at-1.05.14 AM.png";

my @files = map { decode_utf8 $_ } glob( "*" );
my ( $file ) = grep { /^Screenshot-2024-02-23-at-1.05.14\s/ } @files;

my $base_hex = sprintf "%vX", $hex;

for ( $base, $lit, $file ) {
   say $_;
   say $_ eq $base ? "same" : "different";

   my $hex = sprintf "%vX", $_;
   say $hex;
   say $hex eq $base_hex ? "same" : "different";
}

Output:

Screenshot-2024-02-23-at-1.05.14 AM.png
same
53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.74.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67
same
Screenshot-2024-02-23-at-1.05.14 AM.png
same
53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.74.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67
same
Screenshot-2024-02-23-at-1.05.14 AM.png
same
53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.74.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67
same
  • Comment on Re^10: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
  • Select or Download Code

Replies are listed 'Best First'.
Re^11: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
by nysus (Parson) on Aug 13, 2024 at 18:06 UTC

    Wait, SORRY! You are right! Looks like I added a stray semi colon to the name of the file in $blah and so script was still failing. Holy crap I'm an idiot. Using `utf8` does get the two hex dumps to match now. Ok, now to wrap my head around all this. Jesus.

    THANK YOU!

    $PM = "Perl Monk's";
    $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
    $nysus = $PM . ' ' . $MC;
    Click here if you love Perl Monks

Re^11: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
by nysus (Parson) on Aug 13, 2024 at 17:55 UTC

    It makes no difference if if I use utf8 or not (and I thought using us v5.36 set utf8 out of the box, anyway). It sill fails.

    Also, same result with sprintf "%vX", just slightly different output. Try it:

    #! /usr/bin/env perl use v5.36; use utf8; use Encode 'decode'; # get all the files in the current directory my @files = map { decode 'UTF-8', $_ } glob("*"); my ($file) = grep { /Screenshot-2024-02-23-at-1.05.14\s/ } @files; my $ss = $files[0]; my $hex = sprintf "%vX", $ss; say $hex; say $file; # ERROR! my $blah = "Screenshot-2024-02-23-at-1.05.14 AM.png"; my $hex2 = sprintf "%vX", $blah; say $hex2; say $hex eq $hex2 ? "hexes equal" : "hexes not equal"; say $blah =~ /Screenshot-2024-02-23-at-1.05.14\s/; # WORKS!

    OUTPUTS:

    53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.7 +4.2D.31.2E.30.35.2E.31.34.202F.41.4D.2E.70.6E.67 Wide character in say at ./test.pl line 16. Screenshot-2024-02-23-at-1.05.14 AM.png 53.63.72.65.65.6E.73.68.6F.74.2D.32.30.32.34.2D.30.32.2D.32.33.2D.61.7 +4.2D.31.2E.30.35.2E.31.34.26.23.38.32.33.39.3B.41.4D.2E.70.6E.67 hexes not equal

    $PM = "Perl Monk's";
    $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
    $nysus = $PM . ' ' . $MC;
    Click here if you love Perl Monks

      You used the 7 character string   (26.23.38.32.33.39.3B) instead of a NNBSP (202F) in your program X_X.

      I added a program to my previous post.

        Yeah, now that I'm feeling less frustrated, I'm going to go through this thread a few times and try to wrap my head around it all.

        $PM = "Perl Monk's";
        $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
        $nysus = $PM . ' ' . $MC;
        Click here if you love Perl Monks