Re: ZERO_LENGTH match

I think you are misinterpreting the intent of the documentation: when it refers to 'ZERO_LENGTH' in the equivalence it is talking about actual zero length (ie matching an empty string) rather than potential zero length, such as matching /c?/.

The implication is therefore that, for example:

  /( (?: c? )* )/x
[download]

is treated as equivalent to:

  /( (?: c* | ) )/x
[download]

to break the loop.

Similarly for a more complex zero-length expression such as a lookahead:

  /( (?: a | (?=c) ) )*/x
[download]

is treated as equivalent to:

  /( (?: a* | (?=c) ) )/x
[download]

Hope this helps,

Hugo

Comment on Re: ZERO_LENGTH match Select or Download Code

Replies are listed 'Best First'.
Re^2: ZERO_LENGTH match by Anonymous Monk on Aug 01, 2005 at 17:01 UTC
/( (?: a* \| (?=c) ) )/x This doesn't make any sense - a* cannot fail so you'll never end up in the position to try the second alternative. /( (?: a \| (?=c) ) )*/x This means - match as many 'a'-s as you can and when this becomes impossible try the second alternative - if it machtes (a zero-width) you'll face an infinite loop that you want to break - and you does this by allowing only one such zero-width match to happen.	[reply]
Re^3: ZERO_LENGTH match by hv (Prior) on Aug 02, 2005 at 01:15 UTC
Yes, I was thinking as I wrote my reply that it would make more sense to break it as: `/( (?: a+ \| (?=c) ) )/x` [download] But I didn't want to introduce unnecessary complications for the OP, and I wasn't entirely sure there was no deep reason I was missing as to why the docs show `/a*/` rather than `/a+/` for this. Hugo	[reply] [d/l] [select]
Re^3: ZERO_LENGTH match by Anonymous Monk on Aug 01, 2005 at 17:16 UTC
ok, the first paragraph is wrong - if you backtrack you _can_ use the second alternative but fact is that you'll try matching (?=c) at position 0 not at some position after the last 'a' in the string which is the case in /( (?: a \| (?=c) ) )*/x. The breaking of the infinite loop only forces (?=c) to be tried only once but doesn't redefine the position at which this happens.	[reply]
Re^4: ZERO_LENGTH match by fro (Novice) on Aug 02, 2005 at 08:09 UTC
Try this: #!/usr/bin/perl use re "eval"; # (?{ CODE }) is a classical zero-width assertion # that allways succeeds and it is used only for its # side-effects. my $non_zero_width = 'a(?{ print 1 })'; my $zero_width = '(?{ print 2 })'; # These two should be equivalent according to perlre. my $re1 = qq/ (?: $non_zero_width \| $zero_width )* /; my $re2 = qq/ (?: $non_zero_width )* \| (?: $zero_width )? /; # But are they really? $_ = 'aaabbb'; print "\n-----------------\n"; /$re1/x; print "\n-----------------\n"; /$re2/x; print "\n-----------------\n"; [download] The output is: ----------------- 1112 ----------------- 111 ----------------- which proves my point.	[reply] [d/l]