Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( #3333=superdoc: print w/replies, xml ) Need Help??

Recently I was investigating a bug in blead-perl (the latest development release of perl) where some of the Regexp::Common tests were failing. One of the failing tests was for matching an arbitrarily nested balanced pattern, something that you can't actually do with a regular expression. Perl only allows it to be done because of the (??{}) syntax, which executes a piece of code and then uses its return as a pattern to be matched as though it were part of the original pattern. If the returned pattern itself contains a (??{}) the matching can become recursive such as when the returned obeject is the original. An example is this:

our $qr=qr/<(?:[<>\\]+|\\.|(??{$qr}))*>/; print "not " unless '<<><><<<>>><>>'=~/^($qr)$/; print "ok - $1\n";

Now, the idea I had was to be able to write the above as this: (With suitable handwaving about the exact notation)

print "not " unless '<<><><<<>>><>>'=~/^((?&:<(?:[<>\\]+|\\.|(?:&))*>) +)$/; print "ok - $1\n";

The idea is that that the (?&: ... ) marks a subsection of the pattern that can be recursed to. (?&) would mean recurse to the (?&:...) part the pattern. This way the statement is selfcontained, and requires no perl evaluation to occur, and requires only one compiled regexp per pattern, instead of many as the current scheme dictates (embedding a qr// in a larger pattern results in a complete recompile).

An extension of this would be to allow such subsections to named, maybe (?&name:...) and (?&name), which would I think allow some very Perl6 rule like behaviour. The addition of a matches nothing block, say (?&& ... ) block would make it possible to define a bunch of rules and then reuse them in other patterns.

my $rules=qr/(?&& # compile this stuff, but dont match it (&foo: .... ) # define ... (&bar: .... ) # ... some rules ) /x; if ($blah=~/(&foo)(&bar)$rules/) { ... }

As far as I understand it adding this kind of thing to perl5's regex engine wouldn't be particularly difficult. It would only require the addition of a regop or two, and some additional code in the optimiser. Most of the infrastructure to handle (??{}) can be reused, so the main thing is the dealing with nesting/forward declarations and things like that, stuff I dont think would be too hard to handle. Note that all of this assumes the current behaviour of (??{}) WRT capturing parens: the ones that matter are in the top level pattern only. (Although maybe that assumption can be relaxed... I dont know...)

Anyway, I was just curious what people thought of this.

---
$world=~s/war/peace/g


In reply to Perl6ish rules in Perl5's regex engine by demerphq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (3)
As of 2022-12-03 15:00 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?