in reply to Re: Regex code block executes twice per match using look-arounds
in thread Regex code block executes twice per match using look-arounds

Thank you for the reply. I'm obviously not understanding something as I thought my (?{...}) block was at the very end of the regular expression. I didn't think it would be encountered and triggered unless both look-behind and look-ahead succeeded. I expected the matching to go something like this

Initialise pointer to beginning of string test look-behind, nothing preceding so fails advance pointer one place test look-behind, '{' so fails advance pointer one place test look-behind, 'x' so fails advance pointer one place test look-behind, '1' so fails advance pointer one place test look-behind, '}' so succeeds test look-ahead, '[' also succeeds code block encountered, execute advance pointer one place test look-behind, '[' so fails ...

If the DFA had to look past my code block to check something else and then backtrack, the behaviour would make sense. However, I can't see how that is happening in this case.

I'm sure it must be me failing to get my head around something fundamental. Please could you point out where my understanding is lacking.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^3: Regex code block executes twice per match using look-arounds
by ikegami (Patriarch) on Jul 12, 2007 at 18:33 UTC
    test look-behind, '}' so succeeds test look-ahead, '[' also succeeds code block encountered, execute advance pointer one place

    That last step is wrong. It should be

    advance pointer the length of the match

    The length of the match is zero in the case where the lookahead and lookbehind is used. Since the pointer is not advanced, the regexp matches everything twice. Only a final check prevents the regexp from returning an identical match.

    test look-behind, '}' so succeeds test look-ahead, '[' also succeeds code block encountered, execute same match? no, continue advance pointer the length of the match (0) test look-behind, '}' so succeeds test look-ahead, '[' also succeeds code block encountered, execute same match? yes, backtrack

    More on this in Re: Regex code block executes twice per match using look-arounds.

      advance pointer the length of the match

      That was the fundamental piece of the puzzle that I was missing. Indeed, after jettero's 2nd reply I started to investigate with use re 'debug'; and could see the mechanism you describe.

      I changed the regex so that it was using the look-behind but the look-ahead was replaced with a simple capture

      ... my $rxBetween = qr {(?x) (?<=($rxClose)) ($rxOpen) (?{print qq{Match @{ [++ $count] }: on left $1, on right $2\n}} +) }; ... $string =~ s{$rxBetween}{+$2}g; ...

      and that stopped the double execution. It also seemed clear from the debug output that using both look-arounds was making the engine do a lot more work.

      Thank you for your replies and the insights they have given.

      Cheers,

      JohnGG

Re^3: Regex code block executes twice per match using look-arounds
by jettero (Monsignor) on Jul 12, 2007 at 15:43 UTC
    Yes, well, the zero-width of the look behind and look ahead is clearly causing it to do something other than the obvious. I suspect the NFA (thanks sgt, interesting) is looking for something after the zero width thingies... Perhaps it would behave less oddly if you put a '.' before your code? Who really knows what the *FA is really doing?

    Any way you look at it, your regex is unusual since all of the expressions are zero-width, so they kinda match? And the when of code embedded in the regex isn't all that well defined I bet since the perlre page calls it experimental...

    I don't think it's changed in the last 5 years. I wonder when it'll stop being experimental.

    -Paul