Re^2: Delimiters in Regexp::Common

Replies are listed 'Best First'.
Re^3: Delimiters in Regexp::Common (updated) by AnomalousMonk (Archbishop) on May 07, 2018 at 14:46 UTC
The point is that in: `print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\\\/' }/ );` the quoted string becomes `-delim => '\/' ;` In: `print "P1 has path\n" if ($P1 =~ /$RE{delimited}{ -delim => '\/' }/ );` The quoted string becomes: `-delim => '/' ;` This is because the \ is treated as an escape character beteween //. I disagree. In the index expression of an array (positional or associative), the expression is evaluated in scalar context and not in the double-quotish context of a regex into which the array element may happen to be interpolated. So `'\\\/'` and `'\/'` are evaluated in single-quotish context and become the character sequences `\\/` and `\/` respectively. And because of the way backslashes are interpreted in single-quote context, `'\\\\/'` and `'\\\/'` are equivalent, and `'\\/'` and `'\/'` likewise. E.g.: `c:\@Work\Perl\monks\Veltro>perl -wMstrict -MData::Dump -le "my %RE = ( '\\\\/' => 'BackBackFwd1', '\\\/' => 'BackBackFwd2', '\\/' => 'BackFwd1', '\/' => 'BackFwd2', '/' => 'Fwd', ); dd \%RE; ;; my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} } +; print $rx; " { "/" => "Fwd", "\\/" => "BackFwd2", "\\\\/" => "BackBackFwd2" } (?^: BackBackFwd2 BackBackFwd2 BackFwd2 BackFwd2 Fwd )` [download] There are a couple of Data::Dump`::dd()` and hash peculiarities: Why is the key of the value `"BackBackFwd2"` in the `dd` dump represented as `"\\\\/"` when it's given as `'\\\/'` in the hash definition? This is an artifact of the way `dd` represents strings only as double-quoted strings, so a single backslash can only be literally defined as the `"\\"` escape sequence. Why is there no `'BackBackFwd1'` value in the hash? Because the `'\\\\/'` and `'\\\/'` string literals compile identical character sequences (update: i.e., identical keys), and the second key (with the value `'BackBackFwd2'`.) supersedes the first. (And likewise with `'BackFwd1'`) Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Delimiters in Regexp::Common (updated) by Veltro (Hermit) on May 07, 2018 at 23:02 UTC
You may want to have another look at this because each of the next lines do not compile: `print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '/' }/ ); print "P2 has path\n" if ($P2 =~ /$RE{delimited}{ -delim => '\\/' }/ ) +;` [download]	[reply] [d/l]
Re^5: Delimiters in Regexp::Common (updated) by AnomalousMonk (Archbishop) on May 08, 2018 at 20:57 UTC
... the next lines do not compile: ... I've played around with this some more and I'm coming to the conclusion that this has little or nothing to do with Regexp::Common::delimited and more to do with the use of a regex delimiter character within the regex pattern. The following works as I expect with any of `'\/' '\\/' '\\\/' '\\\\/' '\\\\\/' '\\\\\\/'` as the `-delim` delimiter specification: c:\@Work\Perl\monks\Veltro>perl -wMstrict -le "use Regexp::Common qw(delimited); ;; for my $s (qw( a/b/c a\b\c /a/ \a\ a//b a\\\\b // \\\\ a/b a\b a/b\c a\b/c a/ /a a\ \a / \ )) { print qq{'$s' }, $s =~ m{$RE{delimited}{ -delim => '\/' }} ? '' : 'NO ', ' match'; } " 'a/b/c' match 'a\b\c' match '/a/' match '\a\' match 'a//b' match 'a\\b' match '//' match '\\' match 'a/b' NO match 'a\b' NO match 'a/b\c' NO match 'a\b/c' NO match 'a/' NO match '/a' NO match 'a\' NO match '\a' NO match '/' NO match '\' NO match [download] Both `m: ... :` and the balanced `m{ ... }` (my personal preference per TheDamian's regex PBPs) yield the same results. For a `/ ... /` delimited match with the code above, the `-delim` strings: `'\\\/' '\\\\\/'` work as expected; `'\\/' '\\\\/' '\\\\\\/'` fail to compile (`Can't find string terminator "'" ...`); and `'\/'` works partially as expected (go figure). Again, the lesson seems to be: be wary of the presence of a delimiter character within a regex pattern. IIRC from previous regex compilation discussions (and please don't ask me for a citation :), I think what's happening here is that the regex parser looks for the end of a regex using various heuristics as soon as it sees that a regex has opened, and in this case, it sees the forward-slash at the end of the first `'\\/'` (or whatever) single-quoted string and sometimes mistakes it for the regex terminal delimiter. The Perl parser looks for single-quoted strings thereafter, and goes off the rails when it sees that a final single-quote is unmatched. Or something like that... Anyway, don't use `//` regex delimiters here. Update: The "premature regex termination detection" theory is supported if the `my $rx = qr{ $RE{'\\\\/'} $RE{'\\\/'} $RE{'\\/'} $RE{'\/'} $RE{'/'} };` regex from Re^3: Delimiters in Regexp::Common (updated) is re-written with `qr/ ... /` instead: the `"Can't find string terminator "'" anywhere ..."` compilation error results. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^5: Delimiters in Regexp::Common (updated) by swl (Prior) on May 08, 2018 at 09:27 UTC
Regexp::Common returns regexp objects, so one can drop the outer // and it will compile. `print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '/' } ); print "P2 has path\n" if ($P2 =~ $RE{delimited}{ -delim => '\/' } );` [download] Then the escaping becomes a consideration. `use 5.026; use Regexp::Common qw[ delimited ]; say '\/'; say $RE{delimited}{ -delim => '\/' }; say '\\/'; say $RE{delimited}{ -delim => '\\/' }; say '\\\/'; say $RE{delimited}{ -delim => '\\\/' }; say '\\\\/'; say $RE{delimited}{ -delim => '\\\\/' };` [download] produces `\/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\ +.[^\\\/]))(?:\/))) \/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\ +.[^\\\/]))(?:\/))) \\/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\\)(?:[^\\](?:(?:\ +\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\.[^\\\/]))(?:\/))) \\/ (?:(?\|(?:\\)(?:[^\\](?:(?:\\\\)[^\\]))(?:\\)\|(?:\\)(?:[^\\](?:(?:\ +\\\)[^\\]))(?:\\)\|(?:\/)(?:[^\\\/](?:\\.[^\\\/]))(?:\/)))` [download] It also appears that Regexp::Common does not de-duplicate the character sequence before it builds the regexp, as the regexps become more complicated as the sequences increase in length.	[reply] [d/l] [select]
Re^6: Delimiters in Regexp::Common (updated) by Veltro (Hermit) on May 08, 2018 at 10:40 UTC
Re^7: Delimiters in Regexp::Common (updated) by swl (Prior) on May 08, 2018 at 12:30 UTC