Re: Recursive regular expression weirdness

diotalevi provided the explanation of the problem. The following is a tested solution:

...

our $rxNest;
$rxNest = qr{(?x)
   (
      \(
      [^()]*
      (?:
         (??{$rxNest})
         [^()]*
      )*
      \)
   )
   (?{ [ @{$^R}, $1 ] })
};

our $rxOnlyNested = qr{(?x)
   (?{ [] })
   ^
   [^()]*
   (?:
      $rxNest
      [^()]*
   )*
   \z
   (?{ @memoList = @{$^R} })
};

...

    if ($string =~ $rxOnlyNested)

...
[download]

Note that @memoList is now correct whether there is backtracking or not, thanks to the use of $^R.

Note that I used $1 instead of $+. Using $+ will negatively impact on the performance of all regexps without captures executed in that perl interpreter instance, whereever they are located.

Update: Removed unnecessary parens. (Copy and paste bug.)

Comment on Re: Recursive regular expression weirdness Select or Download Code

Replies are listed 'Best First'.
Re^2: Recursive regular expression weirdness by johngg (Canon) on Mar 30, 2006 at 11:44 UTC
Thank you for providing a solution, which I think I understand. Please correct me if I am wrong. The `(?{ [] })` at the start of `$rxOnlyNested` creates an empty list reference and the reference will be held in `$^R` (the result of the last code block). In `$rxNest` the `(?{ [ @{$^R}, $1 ] })` at the end creates a new list reference which contains the dereferenced contents of the last list plus the memory group of the successful match. This list keeps growing as the regular expression recursed, in effect. Finally, in `$rxOnlyNested` the list is dereferenced and assigned to `@memoList`. I hope I have understood correctly and if it is doing what I think it is doing, it is a very neat solution. I am glad I have finally started visiting the Monastery as I am learning a lot of new techniques. Cheers, JohnGG	[reply] [d/l] [select]
Re^3: Recursive regular expression weirdness by ikegami (Patriarch) on Mar 30, 2006 at 15:41 UTC
That's exactly it. Now, you might wonder why I didn't add to @memoList directly. It might be best to use an example: `local $_ = 'abbbbbc'; { local @memoList; / a (?: (b) (?{ push @memoList, $1 }) )* bbc /x; print(@memoList, "\n"); # bbbbb # Wrong! There should only be three! } { local @memoList; / (?{ [] }) a (?: (b) (?{ [ @{$^R}, $1 ] }) )* bbc (?{ @memoList = @{$^R} }) /x; print(@memoList, "\n"); # bbb } { local @temp; local @memoList; / # (?{ @temp = (); }) # Redundant with 'local @temp'. a (?: (b) (?{ local @temp = (@temp, $1); }) )* bbc (?{ @memoList = @temp }) /x; print(@memoList, "\n"); # bbb }` [download] When the regexp engine finds that it read too many 'b's — rememeber that `` is greedy — it backtracks, "unreading" the last 'b' and eventually then a second last 'b'. `$^R` and `@temp` (since we keep using `local` to assign to `@temp`) are unwound when backtracking occurs, so the last assignment is undone with each "unreading". `@memoList` is not unwound, on the other hand, so it keeps holding the extra 'b's. A quick tests shows that using `$^R` is much* faster than using `local`. Update: Removed unnecessary parens. (Copy and paste bug.)	[reply] [d/l] [select]
Re^4: Recursive regular expression weirdness by johngg (Canon) on Mar 30, 2006 at 18:45 UTC
Thank you again for elaborating on your earlier reply. I can see from your examples how localisation is necessary to avoid, for example, `@memoList` getting too many 'b's. There is one thing that is confusing me though. In the second and third patterns I don't understand the significance of the extra (memory group?) brackets around the code block `(?{...})`, as in `((?{ @memoList = @{$^R} }))` and `((?{ @memoList = @temp }))`. Why do we need these when we are assigning within the pattern rather than having to remember outside of it? I must be missing something. Cheers, JohnGG	[reply] [d/l] [select]
Re^5: Recursive regular expression weirdness by ikegami (Patriarch) on Mar 30, 2006 at 19:05 UTC
Re^6: Recursive regular expression weirdness by johngg (Canon) on Mar 30, 2006 at 20:19 UTC