in reply to Delimiters in Regexp::Common

... a bug with the module.

The behavior you're seeing is due to confusion between the  // used to delimit the regex and a  / used within the regex. IIRC, Regexp::Common mungs the regex operators | uses the tie-ed hash %RE to support things like  -delim and  -keep and so forth, and apparently the delimiter confusion is passed over silently as a result. "Properly" delimited regexes don't have this problem:

c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common qw[ delimited ]; my $P1 = '../matrix-ops/matopmul.mk'; my $P2 = 'C:\matrix-ops\matopmul.mk'; print \"P1 has path\n\" if ($P1 =~ /$RE{delimited}{-delim => '\/'}/ ) +; print \"P2 has path\n\" if ($P2 =~ /$RE{delimited}{-delim => '\/'}/ ) +; " P1 has path c:\@Work\Perl\monks>perl -wMstrict -le "use Regexp::Common qw[ delimited ]; my $P1 = '../matrix-ops/matopmul.mk'; my $P2 = 'C:\matrix-ops\matopmul.mk'; print \"P1 has path\n\" if ($P1 =~ m{$RE{delimited}{-delim => '\/'}} +); print \"P2 has path\n\" if ($P2 =~ m{$RE{delimited}{-delim => '\/'}} +); " P1 has path P2 has path
What I have for $Regexp::Common::VERSION and $Regexp::Common::delimited::VERSION are 2011121001 and 2010010201, respectively, so what I have installed is a bit old. Same results under both ActiveState 5.8.9 and Strawberry 5.14.4.1.

I don't know if this behavior constitutes a bug in the module or not, but in regexes in general, if a character that's used to delimit the regex appears unescaped within the regex, that's a problem:

c:\@Work\Perl\monks>perl -wMstrict -le "print 'match' if '/' =~ ///; " syntax error at -e line 1, near "/;" Execution of -e aborted due to compilation errors. c:\@Work\Perl\monks>perl -wMstrict -le "print 'match' if '/' =~ /\//; " match

(And BTW: I'm not sure how the presence of a delimited sequence or subsequence signifies a "path"; there may be some semantic confusion here.)

Update 1: If you're interested in more readable regexes, do yourself a huge favor and investigate the  /x regex modifier.

Update 2: Fixed small, cosmetic-only formatting glitch in first two code examples.


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^2: Delimiters in Regexp::Common
by Veltro (Hermit) on May 07, 2018 at 11:15 UTC

    No bugs if you ask me. A couple of examples of the correct way to properly use single quotes are {-delim => '/\\'} or {-delim => '\/'}/ or {-delim => '\\/'}.

    How the Regex module interprets this I don't know, probably a quotemeta or something like that.

    The point is that in:

    print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\\\/' }/ );

    the quoted string becomes

    -delim => '\/' ;

    In:

    print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\/' }/ );

    The quoted string becomes:

    -delim => '/' ;

    This is because the \ is treated as an escape character beteween //.

    It is always tedious in Perl how mistakes like this slip in. For example take a quick look at swl's example, I think that the '\' characters is specified twice in:

    print $RE{delimited}{-delim => '[\\\/]'};

    I even hope that I did not make any mistakes myself right now :P

      The point is that in:
      print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\\\/' }/ );
      the quoted string becomes
      -delim => '\/' ;
      In:
      print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\/' }/ );
      The quoted string becomes:
      -delim => '/' ;
      This is because the \ is treated as an escape character beteween //.

      I disagree. In the index expression of an array (positional or associative), the expression is evaluated in scalar context and not in the double-quotish context of a regex into which the array element may happen to be interpolated. So  '\\\/' and  '\/' are evaluated in single-quotish context and become the character sequences  \\/ and  \/ respectively. And because of the way backslashes are interpreted in single-quote context,  '\\\\/' and  '\\\/' are equivalent, and  '\\/' and  '\/' likewise. E.g.:

      c:\@Work\Perl\monks\Veltro>perl -wMstrict -MData::Dump -le "my %RE = ( '\\\\/' => 'BackBackFwd1', '\\\/' => 'BackBackFwd2', '\\/' => 'BackFwd1', '\/' => 'BackFwd2', '/' => 'Fwd', ); dd \%RE; ;; my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} } +; print $rx; " { "/" => "Fwd", "\\/" => "BackFwd2", "\\\\/" => "BackBackFwd2" } (?^: BackBackFwd2 BackBackFwd2 BackFwd2 BackFwd2 Fwd )

      There are a couple of Data::Dump::dd() and hash peculiarities:

      • Why is the key of the value  "BackBackFwd2" in the dd dump represented as  "\\\\/" when it's given as  '\\\/' in the hash definition? This is an artifact of the way dd represents strings only as double-quoted strings, so a single backslash can only be literally defined as the  "\\" escape sequence.
      • Why is there no  'BackBackFwd1' value in the hash? Because the  '\\\\/' and  '\\\/' string literals compile identical character sequences (update: i.e., identical keys), and the second key (with the value 'BackBackFwd2'.) supersedes the first. (And likewise with 'BackFwd1')


      Give a man a fish:  <%-{-{-{-<

        You may want to have another look at this because each of the next lines do not compile:

        print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '/' }/ ); print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '\\/' }/ ) +;