in reply to Re: Recursive regular expression weirdness
in thread Recursive regular expression weirdness

Thank you for providing a solution, which I think I understand. Please correct me if I am wrong.

The (?{ [] }) at the start of $rxOnlyNested creates an empty list reference and the reference will be held in $^R (the result of the last code block). In $rxNest the (?{ [ @{$^R}, $1 ] }) at the end creates a new list reference which contains the dereferenced contents of the last list plus the memory group of the successful match. This list keeps growing as the regular expression recursed, in effect. Finally, in $rxOnlyNested the list is dereferenced and assigned to @memoList.

I hope I have understood correctly and if it is doing what I think it is doing, it is a very neat solution. I am glad I have finally started visiting the Monastery as I am learning a lot of new techniques.

Cheers,

JohnGG

Replies are listed 'Best First'.
Re^3: Recursive regular expression weirdness
by ikegami (Patriarch) on Mar 30, 2006 at 15:41 UTC

    That's exactly it.

    Now, you might wonder why I didn't add to @memoList directly. It might be best to use an example:

    local $_ = 'abbbbbc'; { local @memoList; / a (?: (b) (?{ push @memoList, $1 }) )* bbc /x; print(@memoList, "\n"); # bbbbb # Wrong! There should only be three! } { local @memoList; / (?{ [] }) a (?: (b) (?{ [ @{$^R}, $1 ] }) )* bbc (?{ @memoList = @{$^R} }) /x; print(@memoList, "\n"); # bbb } { local @temp; local @memoList; / # (?{ @temp = (); }) # Redundant with 'local @temp'. a (?: (b) (?{ local @temp = (@temp, $1); }) )* bbc (?{ @memoList = @temp }) /x; print(@memoList, "\n"); # bbb }

    When the regexp engine finds that it read too many 'b's — rememeber that * is greedy — it backtracks, "unreading" the last 'b' and eventually then a second last 'b'. $^R and @temp (since we keep using local to assign to @temp) are unwound when backtracking occurs, so the last assignment is undone with each "unreading". @memoList is not unwound, on the other hand, so it keeps holding the extra 'b's.

    A quick tests shows that using $^R is *much* faster than using local.

    Update: Removed unnecessary parens. (Copy and paste bug.)

      Thank you again for elaborating on your earlier reply. I can see from your examples how localisation is necessary to avoid, for example, @memoList getting too many 'b's. There is one thing that is confusing me though. In the second and third patterns I don't understand the significance of the extra (memory group?) brackets around the code block (?{...}), as in ((?{ @memoList = @{$^R} })) and ((?{ @memoList = @temp })). Why do we need these when we are assigning within the pattern rather than having to remember outside of it? I must be missing something.

      Cheers,

      JohnGG

        Copy and paste bug. Those parens are indeed unnecessary here.