Recently I was investigating a bug in blead-perl (the latest development release of perl) where some of the Regexp::Common tests were failing. One of the failing tests was for matching an arbitrarily nested balanced pattern, something that you can't actually do with a regular expression. Perl only allows it to be done because of the (??{}) syntax, which executes a piece of code and then uses its return as a pattern to be matched as though it were part of the original pattern. If the returned pattern itself contains a (??{}) the matching can become recursive such as when the returned obeject is the original. An example is this:

our $qr=qr/<(?:[<>\\]+|\\.|(??{$qr}))*>/; print "not " unless '<<><><<<>>><>>'=~/^($qr)$/; print "ok - $1\n";

Now, the idea I had was to be able to write the above as this: (With suitable handwaving about the exact notation)

print "not " unless '<<><><<<>>><>>'=~/^((?&:<(?:[<>\\]+|\\.|(?:&))*>) +)$/; print "ok - $1\n";

The idea is that that the (?&: ... ) marks a subsection of the pattern that can be recursed to. (?&) would mean recurse to the (?&:...) part the pattern. This way the statement is selfcontained, and requires no perl evaluation to occur, and requires only one compiled regexp per pattern, instead of many as the current scheme dictates (embedding a qr// in a larger pattern results in a complete recompile).

An extension of this would be to allow such subsections to named, maybe (?&name:...) and (?&name), which would I think allow some very Perl6 rule like behaviour. The addition of a matches nothing block, say (?&& ... ) block would make it possible to define a bunch of rules and then reuse them in other patterns.

my $rules=qr/(?&& # compile this stuff, but dont match it (&foo: .... ) # define ... (&bar: .... ) # ... some rules ) /x; if ($blah=~/(&foo)(&bar)$rules/) { ... }

As far as I understand it adding this kind of thing to perl5's regex engine wouldn't be particularly difficult. It would only require the addition of a regop or two, and some additional code in the optimiser. Most of the infrastructure to handle (??{}) can be reused, so the main thing is the dealing with nesting/forward declarations and things like that, stuff I dont think would be too hard to handle. Note that all of this assumes the current behaviour of (??{}) WRT capturing parens: the ones that matter are in the top level pattern only. (Although maybe that assumption can be relaxed... I dont know...)

Anyway, I was just curious what people thought of this.