wind has asked for the wisdom of the Perl Monks concerning the following question:

List of Questions

Preface

Recursive regular expressions using (??{ code }) were part of my perl toolkit that I haven't used in 5+ years. However, I recently came by a forum post that inspired me to see if I could use them again for a regex challenge dealing with balanced parenthesis.

I borrowed some old code from myself to get started and quickly hacked together a solution that has worked for me in the past. Unfortunately, the regex recursed only a single time. (()) matched, but ((())) would only match the inner two levels.

Going back to the drawing board at perlre, I decided to use the newer feature (?PARNO) to accomplish the task and was able to get it to work. However, this did not satisfy since I want to know more about my original problem with (??{ code }): is there something specific I'm doing wrong and/or is there a version of perl it stopped working?

Example

The below code demonstrates my problem. It matches balanced braces and encloses them in a new tag that indicates how many characters they contain <##> </##>. When using (??{ $braces_re }) only a depth of 2 braces will be matched, but using (?R) works as desired.

my $data = do {local $/; <DATA>}; my $braces_re = qr/ \{ (?: (?> [^{}]+ ) | # Use either (?PARNO) or (??{ code }). #(?R) (??{ $braces_re }) )* \} /sx; $data =~ s{($braces_re)}{ my $len = length $1; "<$len> $1 </$len>" }eg; print $data; 1; __DATA__ { ... } { { } } { { { } } } { { } { { { } } } } { ... }
Output using (??{ $braces_re })
<7> { ... } </7> <7> { { } } </7> { <7> { { } } </7> } { <3> { } </3> { <7> { { } } </7> } } <7> { ... } </7>
Output using (?R)
<7> { ... } </7> <7> { { } } </7> <11> { { { } } } </11> <19> { { } { { { } } } } </19> <7> { ... } </7>

Update using strictures

Now for the annoying bit. When writing this post originally, I wanted to report that I of course used strict and warnings, but I hadn't actually had them on. Doing so reported this annoying error

Global symbol "$braces_re" requires explicit package name at (re_eval +1) line 2.

I therefore quickly modified the script by predeclaring $braces_re:

my $braces_re; $braces_re = qr/

And lo and behold, (??{ code }) now works as desired.

I can of course work with this limitation, but it's not completely satisfying since it was not necessary with version 5.8.0 or before. I also find it a little disconcerting that the code actually works still when using the former method, if only to the first level of recursion.

Update, tracing change in: our $x = qr/(??{ $x })/

With ikegami's prodding, I remembered that my original working code used our as the declaration. Finding some old servers, I was able to trace the change in the way strictures validates:

our $x = $x;our $x = qr/(??{ $x })/;my $x = qr/(??{ $x })/;
strawberry perl v5.12.1passesfailsfails
perl v5.10.0 built for i686-linuxpassesfailsfails
perl v5.8.0 built for i386-linux-thread-multipassespassespasses

Please note, that while the my declaration passes in v5.8.0, the recursive nature of the regex still fails in the same way it does currently, by only recursing 1 time.

Summary

Obviously, this feature is still documented as experimental, but I'm curious what any other monks have to say or suggest about this feature. Can it still be something that I keep in my toolkit, or should I just assume that I'll need to stick with (?PARNO) from now on when I want recursion in a regex. Fortunately, it's not a common need as there are often much better methods for parsing, but I like to know whether or not I still have the option of rolling my own solution using regex's if I'm so inclined.

Thank you for your advice/suggestions

Answers

Replies are listed 'Best First'.
Re: (??{ code }) versus (?PARNO) for recursive regular expressions
by ikegami (Patriarch) on Mar 25, 2011 at 22:32 UTC

    I can of course work with this limitation, but it's not completely satisfying since it was not necessary before (don't remember version of perl, 5+years ago)

    That's not true. Variable declarations have always taken effect in the statement following the one containing the declaration. Perhaps you weren't using lexical ("my") variables (or strict) "5+ years ago".


    Unrelated, don't use lexicals from outside the pattern inside of (?{ ... }), (??{ ... }) and anything similar.

    $ perl -wE' sub f { my $x = $_[0]; "" =~ /(??{ say $x; "" })/; } f("abc"); f("def"); ' Variable "$x" will not stay shared at (re_eval 1) line 1. abc abc

    Use a package variable instead.

    $ perl -wE' sub f { local our $x = $_[0]; "" =~ /(??{ say $x; "" })/; } f("abc"); f("def"); ' abc def

      My previous time using (??{ code }) 5 years ago, I declared everything using our and of course did use strict. However, that doesn't have any effect on this code now:

      our $braces_re = qr/ ... (??{ $braces_re }) ... ) /sx;

      Still throws a strictures warning. The only way I've observed being able to get it to work now is by having the variable predeclared, regardless of whether it's done using my or our.

      If it helps, it probably was more like 9-10 years ago. Basically very soon after the (??{ code }) feature was introduced.

        Still throws a strictures warning.

        Of course. Variable declarations includes our.

        These work:

        1. no strict; $x = qr/(??{ $x })/; 2. use strict; our $x = qr/(??{ our $x })/ 3. use strict; our $x = qr/(??{ no strict; $x })/ 4. use strict; our $x; $x = qr/(??{ $x })/ 5. use strict; my $x; $x = qr/(??{ $x })/

        I think there was once a bug where some or all pragmas didn't propagate into (??{ }) and the like. If so, the following would have worked because it would have been equivalent to #3 above:

        6. use strict; our $x = qr/(??{ $x })/

        The earliest I have here is 5.10.1, and it doesn't have this bug.

Re: (??{ code }) versus (?PARNO) for recursive regular expressions
by moritz (Cardinal) on Mar 26, 2011 at 09:26 UTC
    but I'm curious what any other monks have to say or suggest about this feature. Can it still be something that I keep in my toolkit, or should I just assume that I'll need to stick with (?PARNO) from now on when I want recursion in a regex.

    the (??{ code }) feature is available since perl 5.8, and still marked as experimental in 5.12. That's extraordinarily long, and the reason is that the feature did exhibit a multitude of problems. For example in 5.10 it wouldn't recognize say even if the feature was enabled, would segfault when you execute too much code in it and then reenter the regex engine, and had some nasty scoping bugs.

    What I want to say is that it's really still experimental and unreliable. Whenever you've got the chance to use the (?PARNO) syntax instead, you should do it.

Re: (??{ code }) versus (?PARNO) for recursive regular expressions
by JavaFan (Canon) on Mar 26, 2011 at 10:44 UTC
    1) What is the proper syntax to declare a recursive regular expression using (??{ code })
    I'd say, the only proper way is to not use it. (??{ }) (and (?{ })) have many problems. Scoping is one of them. (??{ }) is inefficient, not self contained, and in general, a PITA to use. On top of that, they're buggy. And may crash your program:
    $ perl -wE '/(??{ s!!! })/' Use of uninitialized value $_ in substitution (s///) at (re_eval 1) li +ne 1. ... repeated many times ... Segmentation fault $
    The reason the constructs are still marked experimental is that noone has stepped up and said "Not only do I know how to fix the issues, but here's a patch". For all we know, noone knows how to fix the issues.
    Does anyone know what version of perl the following started throwing a strictures error our $x = qr/(??{ $x }/ and of course why.
    Somewhere between 5.8.1 and 5.8.8. And probably due to a bug fix.
    Can anyone think of a single instance where (??{ code }) would be used for recursive regular expressions over the (?PARNO) feature, or should that usage be deprecated from one's toolbox?
    Only if code returns something different - perhaps because parameters are passed. Other than contrived examples, I cannot come up with something where I'd prefer to use such pattern.