List of Questions

Preface

Recursive regular expressions using (??{ code }) were part of my perl toolkit that I haven't used in 5+ years. However, I recently came by a forum post that inspired me to see if I could use them again for a regex challenge dealing with balanced parenthesis.

I borrowed some old code from myself to get started and quickly hacked together a solution that has worked for me in the past. Unfortunately, the regex recursed only a single time. (()) matched, but ((())) would only match the inner two levels.

Going back to the drawing board at perlre, I decided to use the newer feature (?PARNO) to accomplish the task and was able to get it to work. However, this did not satisfy since I want to know more about my original problem with (??{ code }): is there something specific I'm doing wrong and/or is there a version of perl it stopped working?

Example

The below code demonstrates my problem. It matches balanced braces and encloses them in a new tag that indicates how many characters they contain <##> </##>. When using (??{ $braces_re }) only a depth of 2 braces will be matched, but using (?R) works as desired.

my $data = do {local $/; <DATA>}; my $braces_re = qr/ \{ (?: (?> [^{}]+ ) | # Use either (?PARNO) or (??{ code }). #(?R) (??{ $braces_re }) )* \} /sx; $data =~ s{($braces_re)}{ my $len = length $1; "<$len> $1 </$len>" }eg; print $data; 1; __DATA__ { ... } { { } } { { { } } } { { } { { { } } } } { ... }
Output using (??{ $braces_re })
<7> { ... } </7> <7> { { } } </7> { <7> { { } } </7> } { <3> { } </3> { <7> { { } } </7> } } <7> { ... } </7>
Output using (?R)
<7> { ... } </7> <7> { { } } </7> <11> { { { } } } </11> <19> { { } { { { } } } } </19> <7> { ... } </7>

Update using strictures

Now for the annoying bit. When writing this post originally, I wanted to report that I of course used strict and warnings, but I hadn't actually had them on. Doing so reported this annoying error

Global symbol "$braces_re" requires explicit package name at (re_eval +1) line 2.

I therefore quickly modified the script by predeclaring $braces_re:

my $braces_re; $braces_re = qr/

And lo and behold, (??{ code }) now works as desired.

I can of course work with this limitation, but it's not completely satisfying since it was not necessary with version 5.8.0 or before. I also find it a little disconcerting that the code actually works still when using the former method, if only to the first level of recursion.

Update, tracing change in: our $x = qr/(??{ $x })/

With ikegami's prodding, I remembered that my original working code used our as the declaration. Finding some old servers, I was able to trace the change in the way strictures validates:

our $x = $x;our $x = qr/(??{ $x })/;my $x = qr/(??{ $x })/;
strawberry perl v5.12.1passesfailsfails
perl v5.10.0 built for i686-linuxpassesfailsfails
perl v5.8.0 built for i386-linux-thread-multipassespassespasses

Please note, that while the my declaration passes in v5.8.0, the recursive nature of the regex still fails in the same way it does currently, by only recursing 1 time.

Summary

Obviously, this feature is still documented as experimental, but I'm curious what any other monks have to say or suggest about this feature. Can it still be something that I keep in my toolkit, or should I just assume that I'll need to stick with (?PARNO) from now on when I want recursion in a regex. Fortunately, it's not a common need as there are often much better methods for parsing, but I like to know whether or not I still have the option of rolling my own solution using regex's if I'm so inclined.

Thank you for your advice/suggestions

Answers


In reply to (??{ code }) versus (?PARNO) for recursive regular expressions by wind

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.