wind has asked for the wisdom of the Perl Monks concerning the following question:
Recursive regular expressions using (??{ code }) were part of my perl toolkit that I haven't used in 5+ years. However, I recently came by a forum post that inspired me to see if I could use them again for a regex challenge dealing with balanced parenthesis.
I borrowed some old code from myself to get started and quickly hacked together a solution that has worked for me in the past. Unfortunately, the regex recursed only a single time. (()) matched, but ((())) would only match the inner two levels.
Going back to the drawing board at perlre, I decided to use the newer feature (?PARNO) to accomplish the task and was able to get it to work. However, this did not satisfy since I want to know more about my original problem with (??{ code }): is there something specific I'm doing wrong and/or is there a version of perl it stopped working?
The below code demonstrates my problem. It matches balanced braces and encloses them in a new tag that indicates how many characters they contain <##> </##>. When using (??{ $braces_re }) only a depth of 2 braces will be matched, but using (?R) works as desired.
Output using (??{ $braces_re })my $data = do {local $/; <DATA>}; my $braces_re = qr/ \{ (?: (?> [^{}]+ ) | # Use either (?PARNO) or (??{ code }). #(?R) (??{ $braces_re }) )* \} /sx; $data =~ s{($braces_re)}{ my $len = length $1; "<$len> $1 </$len>" }eg; print $data; 1; __DATA__ { ... } { { } } { { { } } } { { } { { { } } } } { ... }
Output using (?R)<7> { ... } </7> <7> { { } } </7> { <7> { { } } </7> } { <3> { } </3> { <7> { { } } </7> } } <7> { ... } </7>
<7> { ... } </7> <7> { { } } </7> <11> { { { } } } </11> <19> { { } { { { } } } } </19> <7> { ... } </7>
Now for the annoying bit. When writing this post originally, I wanted to report that I of course used strict and warnings, but I hadn't actually had them on. Doing so reported this annoying error
Global symbol "$braces_re" requires explicit package name at (re_eval +1) line 2.
I therefore quickly modified the script by predeclaring $braces_re:
my $braces_re; $braces_re = qr/
And lo and behold, (??{ code }) now works as desired.
I can of course work with this limitation, but it's not completely satisfying since it was not necessary with version 5.8.0 or before. I also find it a little disconcerting that the code actually works still when using the former method, if only to the first level of recursion.
With ikegami's prodding, I remembered that my original working code used our as the declaration. Finding some old servers, I was able to trace the change in the way strictures validates:
| our $x = $x; | our $x = qr/(??{ $x })/; | my $x = qr/(??{ $x })/; | |
|---|---|---|---|
| strawberry perl v5.12.1 | passes | fails | fails |
| perl v5.10.0 built for i686-linux | passes | fails | fails |
| perl v5.8.0 built for i386-linux-thread-multi | passes | passes | passes |
Please note, that while the my declaration passes in v5.8.0, the recursive nature of the regex still fails in the same way it does currently, by only recursing 1 time.
Obviously, this feature is still documented as experimental, but I'm curious what any other monks have to say or suggest about this feature. Can it still be something that I keep in my toolkit, or should I just assume that I'll need to stick with (?PARNO) from now on when I want recursion in a regex. Fortunately, it's not a common need as there are often much better methods for parsing, but I like to know whether or not I still have the option of rolling my own solution using regex's if I'm so inclined.
Thank you for your advice/suggestions
1. our $x; $x = qr/(??{ $x })/; or some equivalent
our $x = qr/(??{ $x })/; should work, and does in earlier version of perl. However, if you want to be forward compatible with 5.10 and 5.12, then should go ahead and predeclare variables.
2. Sometime on or before 5.10.0 and after 5.8.0
3. Only if you need recursive regular expressions before 5.10 when (?PARNO) was introduced.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: (??{ code }) versus (?PARNO) for recursive regular expressions
by ikegami (Patriarch) on Mar 25, 2011 at 22:32 UTC | |
by wind (Priest) on Mar 25, 2011 at 22:41 UTC | |
by ikegami (Patriarch) on Mar 25, 2011 at 22:47 UTC | |
by wind (Priest) on Mar 25, 2011 at 23:32 UTC | |
by ikegami (Patriarch) on Mar 25, 2011 at 23:46 UTC | |
| |
|
Re: (??{ code }) versus (?PARNO) for recursive regular expressions
by moritz (Cardinal) on Mar 26, 2011 at 09:26 UTC | |
|
Re: (??{ code }) versus (?PARNO) for recursive regular expressions
by JavaFan (Canon) on Mar 26, 2011 at 10:44 UTC |