in reply to Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?

\s matches whitespace characters, which includes U+202F NARROW NO-BREAK SPACE.

$ perl -le'print "\x{202F}" =~ /^\s\z/ ? "match" : "no match"' match

Do you have a NNBSP, or do you have its UTF-8 encoding? Don't forget to decode your inputs (and encode your outputs)!

If you need further help, please provide the output of sprintf( "%vX", $_ ) for a string that supposedly includes a NNBSP.

  • Comment on Re: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Any good ways to handle NARROW NO-BREAK SPACE characters in regex in newer versions of Perl?
by nysus (Parson) on Aug 13, 2024 at 15:57 UTC

    I'm not sure what I have. The file definitely has an invisible character in it. When I copy and paste the file name:

    Works:

    perl -wMstrict -lE "qq/Screenshot-2024-02-23-at-1.05.14 AM.png/ =~ /A/ + and say 'OK' or die"

    Doesn't work:

    perl -wMstrict -lE "qq/Screenshot-2024-02-23-at-1.05.14 AM.png/ =~ /\s +/ and say 'OK' or die"

    The character between the "4" and "A" is the invisible character.

    $PM = "Perl Monk's";
    $MC = "Most Clueless Friar Abbot Bishop Pontiff Deacon Curate Priest Vicar Parson";
    $nysus = $PM . ' ' . $MC;
    Click here if you love Perl Monks

      I would strongly recommend not using the command line, as it can have its own encoding issues, instead put everything in a script; if it contains non-ASCII characters, make sure to save it as UTF-8 and use utf8;.