in reply to Re^2: Delimiters in Regexp::Common
in thread Delimiters in Regexp::Common

The point is that in:
print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\\\/' }/ );
the quoted string becomes
-delim => '\/' ;
In:
print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\/' }/ );
The quoted string becomes:
-delim => '/' ;
This is because the \ is treated as an escape character beteween //.

I disagree. In the index expression of an array (positional or associative), the expression is evaluated in scalar context and not in the double-quotish context of a regex into which the array element may happen to be interpolated. So  '\\\/' and  '\/' are evaluated in single-quotish context and become the character sequences  \\/ and  \/ respectively. And because of the way backslashes are interpreted in single-quote context,  '\\\\/' and  '\\\/' are equivalent, and  '\\/' and  '\/' likewise. E.g.:

c:\@Work\Perl\monks\Veltro>perl -wMstrict -MData::Dump -le "my %RE = ( '\\\\/' => 'BackBackFwd1', '\\\/' => 'BackBackFwd2', '\\/' => 'BackFwd1', '\/' => 'BackFwd2', '/' => 'Fwd', ); dd \%RE; ;; my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} } +; print $rx; " { "/" => "Fwd", "\\/" => "BackFwd2", "\\\\/" => "BackBackFwd2" } (?^: BackBackFwd2 BackBackFwd2 BackFwd2 BackFwd2 Fwd )

There are a couple of Data::Dump::dd() and hash peculiarities:


Give a man a fish:  <%-{-{-{-<

Replies are listed 'Best First'.
Re^4: Delimiters in Regexp::Common (updated)
by Veltro (Hermit) on May 07, 2018 at 23:02 UTC

    You may want to have another look at this because each of the next lines do not compile:

    print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '/' }/ ); print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '\\/' }/ ) +;
      ... the next lines do not compile: ...

      I've played around with this some more and I'm coming to the conclusion that this has little or nothing to do with Regexp::Common::delimited and more to do with the use of a regex delimiter character within the regex pattern. The following works as I expect with any of
          '\/'  '\\/'  '\\\/'  '\\\\/'  '\\\\\/'  '\\\\\\/'
      as the  -delim delimiter specification:

      c:\@Work\Perl\monks\Veltro>perl -wMstrict -le "use Regexp::Common qw(delimited); ;; for my $s (qw( a/b/c a\b\c /a/ \a\ a//b a\\\\b // \\\\ a/b a\b a/b\c a\b/c a/ /a a\ \a / \ )) { print qq{'$s' }, $s =~ m{$RE{delimited}{ -delim => '\/' }} ? '' : 'NO ', ' match'; } " 'a/b/c' match 'a\b\c' match '/a/' match '\a\' match 'a//b' match 'a\\b' match '//' match '\\' match 'a/b' NO match 'a\b' NO match 'a/b\c' NO match 'a\b/c' NO match 'a/' NO match '/a' NO match 'a\' NO match '\a' NO match '/' NO match '\' NO match
      Both  m: ... : and the balanced  m{ ... } (my personal preference per TheDamian's regex PBPs) yield the same results.

      For a  / ... / delimited match with the code above, the  -delim strings:

      • '\\\/'  '\\\\\/' work as expected;
      • '\\/'  '\\\\/'  '\\\\\\/' fail to compile (Can't find string terminator "'" ...); and
      • '\/' works partially as expected (go figure).
      Again, the lesson seems to be: be wary of the presence of a delimiter character within a regex pattern.

      IIRC from previous regex compilation discussions (and please don't ask me for a citation :), I think what's happening here is that the regex parser looks for the end of a regex using various heuristics as soon as it sees that a regex has opened, and in this case, it sees the forward-slash at the end of the first  '\\/' (or whatever) single-quoted string and sometimes mistakes it for the regex terminal delimiter. The Perl parser looks for single-quoted strings thereafter, and goes off the rails when it sees that a final single-quote is unmatched. Or something like that... Anyway, don't use  // regex delimiters here.

      Update: The "premature regex termination detection" theory is supported if the
          my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} };
      regex from Re^3: Delimiters in Regexp::Common (updated) is re-written with  qr/ ... / instead: the "Can't find string terminator "'" anywhere ..." compilation error results.


      Give a man a fish:  <%-{-{-{-<

      Regexp::Common returns regexp objects, so one can drop the outer // and it will compile.

      print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '/' } ); print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '\/' } );

      Then the escaping becomes a consideration.

      use 5.026; use Regexp::Common qw[ delimited ]; say '\/'; say $RE{delimited}{ -delim => '\/' }; say '\\/'; say $RE{delimited}{ -delim => '\\/' }; say '\\\/'; say $RE{delimited}{ -delim => '\\\/' }; say '\\\\/'; say $RE{delimited}{ -delim => '\\\\/' };

      produces

      \/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\ +.[^\\\/]*)*)(?:\/))) \/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\ +.[^\\\/]*)*)(?:\/))) \\/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\\)(?:[^\\]*(?:(?:\ +\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\.[^\\\/]*)*)(?:\/))) \\/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\\)(?:[^\\]*(?:(?:\ +\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\.[^\\\/]*)*)(?:\/)))

      It also appears that Regexp::Common does not de-duplicate the character sequence before it builds the regexp, as the regexps become more complicated as the sequences increase in length.

        Exactly, these results makes 100% sense to me.

        But I don't understand your comment that escaping becomes a consideration (as in problem?). I would say escaping becomes less of a consideration because you can follow normal quotation rules