in reply to Variable-Width Lookbehind (hacked via recursion)

Nice trick :-). Perhaps looking ahead only to immediately look back isn't necessary? (Same for other regexen, + all tests are OK):

my $re1 = qr{ (?<target> x (?<digits> \d+ ) (?!\d) ) (?<lookback> (?<= (?! (?<match> ab \g{digits} (?!\d) ) ) . (?=(?&lookback)) . | (?=(?&match)) . . ) ) }msx;

Replies are listed 'Best First'.
Re^2: Variable-Width Lookbehind (hacked via recursion)
by haukex (Archbishop) on Oct 26, 2017 at 05:04 UTC

    Excellent point, thank you for spotting that! I can confirm that the (?= ) around (?<lookback> ) can be removed in all cases in the root node (since (?<= ) is already zero-width). Makes the regexes even shorter! :-)

    It's probably a vestige from the negative case like here or in the following, where the (?! (?<lookback> ... ) ) is needed*.

    # Match any /\d./ that is *not* preceded by an /a/ my $re5 = qr{ (?! (?<lookback> (?<= a | (?=(?&lookback)) . ) ) ) (?<target> \d . ) }msx; my $re5_short = qr /(?!((?<= a |(?=(?-1)).))) (\d.) /sx; for my $regex ($re5,$re5_short) { unlike "fo", $regex; unlike "x5", $regex; unlike "ab5 x4", $regex; like "5ab", $regex; like "x5 ab5", $regex; like "x5 ab5 x2", $regex; my @results; while ("x2 4x3a55aaa1" =~ /$regex/g) { push @results, $+{target} // $2 } is_deeply \@results, ["2 ","4x","3a"]; }

    * Update: Hmm, actually, it turns out this seems to work too... (although putting the exact explanation of why into words is eluding me at the moment...)

    my $re5 = qr{ (?<lookback> (?<! a | (?!(?&lookback)) . ) ) (?<target> \d . ) }msx; my $re5_short = qr /((?<! a |(?!(?-1)).)) (\d.) /sx;
      putting the explanation of why into words...

      So the two key things to note are:

      • The pattern (?<!X) (for any character X) matches at the beginning of the string (because there is no preceding character), and
      • the double negation of (?<! (?! ) ) means that whatever the inner call to (?&lookback) returns (match/no match) is what the outer (?<lookback> ) will return. So what the last, innermost (furthest left) lookback returns is what the whole, outermost lookback will return.

      So for the regex in question it boils down to two cases:

      • If there is no preceding "a", then the regex will recurse all the way to the beginning of the string, where lookback will match.
      • If there is a preceding "a", then (?<!a) will cause the match to fail.

      Minor edit for clarification.