Re^4: Delimiters in Regexp::Common (updated)

Replies are listed 'Best First'.
Re^5: Delimiters in Regexp::Common (updated) by AnomalousMonk (Archbishop) on May 08, 2018 at 20:57 UTC
... the next lines do not compile: ... I've played around with this some more and I'm coming to the conclusion that this has little or nothing to do with Regexp::Common::delimited and more to do with the use of a regex delimiter character within the regex pattern. The following works as I expect with any of `'\/' '\\/' '\\\/' '\\\\/' '\\\\\/' '\\\\\\/'` as the `-delim` delimiter specification: c:\@Work\Perl\monks\Veltro>perl -wMstrict -le "use Regexp::Common qw(delimited); ;; for my $s (qw( a/b/c a\b\c /a/ \a\ a//b a\\\\b // \\\\ a/b a\b a/b\c a\b/c a/ /a a\ \a / \ )) { print qq{'$s' }, $s =~ m{$RE{delimited}{ -delim => '\/' }} ? '' : 'NO ', ' match'; } " 'a/b/c' match 'a\b\c' match '/a/' match '\a\' match 'a//b' match 'a\\b' match '//' match '\\' match 'a/b' NO match 'a\b' NO match 'a/b\c' NO match 'a\b/c' NO match 'a/' NO match '/a' NO match 'a\' NO match '\a' NO match '/' NO match '\' NO match [download] Both `m: ... :` and the balanced `m{ ... }` (my personal preference per TheDamian's regex PBPs) yield the same results. For a `/ ... /` delimited match with the code above, the `-delim` strings: `'\\\/' '\\\\\/'` work as expected; `'\\/' '\\\\/' '\\\\\\/'` fail to compile (`Can't find string terminator "'" ...`); and `'\/'` works partially as expected (go figure). Again, the lesson seems to be: be wary of the presence of a delimiter character within a regex pattern. IIRC from previous regex compilation discussions (and please don't ask me for a citation :), I think what's happening here is that the regex parser looks for the end of a regex using various heuristics as soon as it sees that a regex has opened, and in this case, it sees the forward-slash at the end of the first `'\\/'` (or whatever) single-quoted string and sometimes mistakes it for the regex terminal delimiter. The Perl parser looks for single-quoted strings thereafter, and goes off the rails when it sees that a final single-quote is unmatched. Or something like that... Anyway, don't use `//` regex delimiters here. Update: The "premature regex termination detection" theory is supported if the `my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} };` regex from Re^3: Delimiters in Regexp::Common (updated) is re-written with `qr/ ... /` instead: the `"Can't find string terminator "'" anywhere ..."` compilation error results. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^5: Delimiters in Regexp::Common (updated) by swl (Prior) on May 08, 2018 at 09:27 UTC
Regexp::Common returns regexp objects, so one can drop the outer // and it will compile. `print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '/' } ); print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '\/' } );` [download] Then the escaping becomes a consideration. `use 5.026; use Regexp::Common qw[ delimited ]; say '\/'; say $RE{delimited}{ -delim => '\/' }; say '\\/'; say $RE{delimited}{ -delim => '\\/' }; say '\\\/'; say $RE{delimited}{ -delim => '\\\/' }; say '\\\\/'; say $RE{delimited}{ -delim => '\\\\/' };` [download] produces `\/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\ +.[^\\\/]))(?:\/))) \/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\ +.[^\\\/]))(?:\/))) \\/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\\)(?:[^\\](?:(?:\ +\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\.[^\\\/]))(?:\/))) \\/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\\)(?:[^\\](?:(?:\ +\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\.[^\\\/]))(?:\/)))` [download] It also appears that Regexp::Common does not de-duplicate the character sequence before it builds the regexp, as the regexps become more complicated as the sequences increase in length.	[reply] [d/l] [select]
Re^6: Delimiters in Regexp::Common (updated) by Veltro (Hermit) on May 08, 2018 at 10:40 UTC
Exactly, these results makes 100% sense to me. But I don't understand your comment that escaping becomes a consideration (as in problem?). I would say escaping becomes less of a consideration because you can follow normal quotation rules	[reply]
Re^7: Delimiters in Regexp::Common (updated) by swl (Prior) on May 08, 2018 at 12:30 UTC
A consideration in that one should bear it in mind, as distinct from it being a concern.	[reply]