in reply to Re: Why is variable length lookahead implemented while lookbehind is not?
in thread Why is variable length lookahead implemented while lookbehind is not?

When you write a backtracking regex engine like the one in Perl 5, a full (variable length) look-ahead is pretty easy to implement.

It seems to me that the easy solution would produce bad performance. The easy solution requires matching each lookbehind subpattern at pos $current_pos - $max_match_length, which is 0 if /.*/ is used!

'bbbbbb' =~ /(?<=[^z]*a)b/

requires checking for "z" and "a" a total of 72 times each before failing even though the string is only 6 chars long!

Oops, I guess you covered this at the end, but it seems to me that it's a foremost concern.

It also forces

'xiyjykz' =~ /(?<=x(.*)y(.*))z/s
to produce
$1 = 'iyj'; $2 = 'k'
but one might expect
$1 = 'i'; $2 = 'jyk'

Replies are listed 'Best First'.
Re^3: Why is variable length lookahead implemented while lookbehind is not?
by moritz (Cardinal) on Sep 07, 2011 at 20:56 UTC
    When you write a backtracking regex engine like the one in Perl 5, a full (variable length) look-ahead is pretty easy to implement.

    It seems to me that the easy solution would produce bad performance. The easy solution requires matching each lookbehind subpattern at pos $current_pos - $max_match_length

    (emphasis mine)

    It seems we're talking about two different things here. The simple approach for look-ahead is not slow (at least if you take care not to backtrack the look-ahead). And that's probably the reason it's implemented in Perl, and most advanced regex engines.

    In the case of look-behind, we're in violent agreement :-)

Re^3: Why is variable length lookahead implemented while lookbehind is not?
by GrandFather (Saint) on Sep 07, 2011 at 20:42 UTC

    My first thought for an "easy solution" is to generate a reversed version of the target string if there is a variable length look behind, then use the usual look ahead match processing on the reversed string starting at the "reversed" anchor point.

    True laziness is hard work
      That is what is discussed at Variable-length lookbehind at dev.perl.org. The suggestion to write the variable-length lookbehind in right-to-left fashion was rejected as being obscure. The Perl regex-engine will have to do this reversing itself, which is not always possible or easy to do.

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      Reversing the string requires reversing the regex pattern too, and that's not trivial. For example, (?<=(.).*\1) would try to match \1.*(.), but \1 hasn't been populated yet.

      Update:

      ... What am I saying! Just have people write (.).*\1. That's what they'd have to do with a right-to-left parser too. I suspect all issues can be solved with proper documentation.