in reply to Re^3: Delimiters in Regexp::Common (updated)
in thread Delimiters in Regexp::Common

You may want to have another look at this because each of the next lines do not compile:

print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '/' }/ ); print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '\\/' }/ ) +;

Replies are listed 'Best First'.
Re^5: Delimiters in Regexp::Common (updated)
by AnomalousMonk (Archbishop) on May 08, 2018 at 20:57 UTC
    ... the next lines do not compile: ...

    I've played around with this some more and I'm coming to the conclusion that this has little or nothing to do with Regexp::Common::delimited and more to do with the use of a regex delimiter character within the regex pattern. The following works as I expect with any of
        '\/'  '\\/'  '\\\/'  '\\\\/'  '\\\\\/'  '\\\\\\/'
    as the  -delim delimiter specification:

    c:\@Work\Perl\monks\Veltro>perl -wMstrict -le "use Regexp::Common qw(delimited); ;; for my $s (qw( a/b/c a\b\c /a/ \a\ a//b a\\\\b // \\\\ a/b a\b a/b\c a\b/c a/ /a a\ \a / \ )) { print qq{'$s' }, $s =~ m{$RE{delimited}{ -delim => '\/' }} ? '' : 'NO ', ' match'; } " 'a/b/c' match 'a\b\c' match '/a/' match '\a\' match 'a//b' match 'a\\b' match '//' match '\\' match 'a/b' NO match 'a\b' NO match 'a/b\c' NO match 'a\b/c' NO match 'a/' NO match '/a' NO match 'a\' NO match '\a' NO match '/' NO match '\' NO match
    Both  m: ... : and the balanced  m{ ... } (my personal preference per TheDamian's regex PBPs) yield the same results.

    For a  / ... / delimited match with the code above, the  -delim strings:

    • '\\\/'  '\\\\\/' work as expected;
    • '\\/'  '\\\\/'  '\\\\\\/' fail to compile (Can't find string terminator "'" ...); and
    • '\/' works partially as expected (go figure).
    Again, the lesson seems to be: be wary of the presence of a delimiter character within a regex pattern.

    IIRC from previous regex compilation discussions (and please don't ask me for a citation :), I think what's happening here is that the regex parser looks for the end of a regex using various heuristics as soon as it sees that a regex has opened, and in this case, it sees the forward-slash at the end of the first  '\\/' (or whatever) single-quoted string and sometimes mistakes it for the regex terminal delimiter. The Perl parser looks for single-quoted strings thereafter, and goes off the rails when it sees that a final single-quote is unmatched. Or something like that... Anyway, don't use  // regex delimiters here.

    Update: The "premature regex termination detection" theory is supported if the
        my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} };
    regex from Re^3: Delimiters in Regexp::Common (updated) is re-written with  qr/ ... / instead: the "Can't find string terminator "'" anywhere ..." compilation error results.


    Give a man a fish:  <%-{-{-{-<

Re^5: Delimiters in Regexp::Common (updated)
by swl (Prior) on May 08, 2018 at 09:27 UTC

    Regexp::Common returns regexp objects, so one can drop the outer // and it will compile.

    print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '/' } ); print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '\/' } );

    Then the escaping becomes a consideration.

    use 5.026; use Regexp::Common qw[ delimited ]; say '\/'; say $RE{delimited}{ -delim => '\/' }; say '\\/'; say $RE{delimited}{ -delim => '\\/' }; say '\\\/'; say $RE{delimited}{ -delim => '\\\/' }; say '\\\\/'; say $RE{delimited}{ -delim => '\\\\/' };

    produces

    \/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\ +.[^\\\/]*)*)(?:\/))) \/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\ +.[^\\\/]*)*)(?:\/))) \\/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\\)(?:[^\\]*(?:(?:\ +\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\.[^\\\/]*)*)(?:\/))) \\/ (?:(?|(?:\\)(?:[^\\]*(?:(?:\\\\)[^\\]*)*)(?:\\)|(?:\\)(?:[^\\]*(?:(?:\ +\\\)[^\\]*)*)(?:\\)|(?:\/)(?:[^\\\/]*(?:\\.[^\\\/]*)*)(?:\/)))

    It also appears that Regexp::Common does not de-duplicate the character sequence before it builds the regexp, as the regexps become more complicated as the sequences increase in length.

      Exactly, these results makes 100% sense to me.

      But I don't understand your comment that escaping becomes a consideration (as in problem?). I would say escaping becomes less of a consideration because you can follow normal quotation rules

        A consideration in that one should bear it in mind, as distinct from it being a concern.