in reply to Re: ZERO_LENGTH match
in thread ZERO_LENGTH match

/( (?: a* | (?=c) ) )/x

This doesn't make any sense - a* cannot fail so you'll
never end up in the position to try the second
alternative.

/( (?: a | (?=c) ) )*/x

This means - match as many 'a'-s as you can and when
this becomes impossible try the second alternative -
if it machtes (a zero-width) you'll face an infinite
loop that you want to break - and you does this by
allowing only one such zero-width match to happen.

Replies are listed 'Best First'.
Re^3: ZERO_LENGTH match
by hv (Prior) on Aug 02, 2005 at 01:15 UTC

    Yes, I was thinking as I wrote my reply that it would make more sense to break it as:

    /( (?: a+ | (?=c) ) )/x

    But I didn't want to introduce unnecessary complications for the OP, and I wasn't entirely sure there was no deep reason I was missing as to why the docs show /a*/ rather than /a+/ for this.

    Hugo

Re^3: ZERO_LENGTH match
by Anonymous Monk on Aug 01, 2005 at 17:16 UTC


    ok, the first paragraph is wrong - if you backtrack
    you _can_ use the second alternative but fact is that
    you'll try matching (?=c) at position 0 not at some
    position after the last 'a' in the string which is the
    case in /( (?: a | (?=c) ) )*/x.
    The breaking of the infinite loop only forces (?=c) to
    be tried only once but doesn't redefine the position
    at which this happens.

      Try this:

      #!/usr/bin/perl use re "eval"; # (?{ CODE }) is a classical zero-width assertion # that allways succeeds and it is used only for its # side-effects. my $non_zero_width = 'a(?{ print 1 })'; my $zero_width = '(?{ print 2 })'; # These two should be equivalent according to perlre. my $re1 = qq/ (?: $non_zero_width | $zero_width )* /; my $re2 = qq/ (?: $non_zero_width )* | (?: $zero_width )? /; # But are they really? $_ = 'aaabbb'; print "\n-----------------\n"; /$re1/x; print "\n-----------------\n"; /$re2/x; print "\n-----------------\n";
      The output is:
      -----------------
      1112
      -----------------
      111
      -----------------
      which proves my point.