I found your post while trying to solve one of my own problems at (??{ code }) versus (?PARNO) for recursive regular expressions. Anyway, in that process, I solved your original problem using both (??{ code }) and (?PARNO) which had not been introduced at the time of your question. I'm therefore just sharing them here in case anyone else happens by this thread.

#!/usr/bin/perl use strict; use warnings; our $code_re; $code_re = qr{ \( (?: (?>[^()]+) | (??{$code_re}) )* \) }x; our $parno_re = qr{ ( \( (?: (?>[^()]+) | (?-1) )* \) ) }x; while (<DATA>) { chomp; print /^(?:(?>[^()]+)|$code_re)*$/ ? 'PASS' : ' '; print /^(?:(?>[^()]+)|$parno_re)*$/ ? ' PASS' : ' '; print " $_\n"; } __DATA__ + Cont(ains balanced( nested Br(ack)ets )in t)he text - Con(tains i(mbalan(ced Br(ack)ets, )one c)lose missing - Contains i(mbalan(ced Br(ack)ets, )one op)en m)missing + No brackets in this string - Won)ky br(ackets in) this s(tring - More wonky br(ackets in) th)is s(tring - Just the one( leading bracket - And just th)e one trailing bracket + So(me m(ultip)le n(est(s in) thi)s o)ne + Ther(e is( mo(re) de(e)p )nes(ti(n(g i)n (mul)ti)p(l)es) he)re + Some d((oub)le b)rackets + ab(())cde + ab()()cde - ab(c(d)e - ab(c)d)e
Output
PASS PASS + Cont(ains balanced( nested Br(ack)ets )in t)he text - Con(tains i(mbalan(ced Br(ack)ets, )one c)lose missing - Contains i(mbalan(ced Br(ack)ets, )one op)en m)missing PASS PASS + No brackets in this string - Won)ky br(ackets in) this s(tring - More wonky br(ackets in) th)is s(tring - Just the one( leading bracket - And just th)e one trailing bracket PASS PASS + So(me m(ultip)le n(est(s in) thi)s o)ne PASS PASS + Ther(e is( mo(re) de(e)p )nes(ti(n(g i)n (mul)ti)p(l)es) h +e)re PASS PASS + Some d((oub)le b)rackets PASS PASS + ab(())cde PASS PASS + ab()()cde - ab(c(d)e - ab(c)d)e

I also attempted to get the tracking of captured groups working like ikegami and others demonstrated. I can't think of a way to get it to work with (?PARNO). However, I did get a working solution using (??{ code }). The biggest problem I noticed with your doubling of results was that you didn't have enough (?> ) sections to avoid backtracking.

our @matches; our $code_re; $code_re = qr{ ( \( (?: (?>[^()]+) | (??{$code_re}) )* \) ) (?{ push @matches, $1 }) }x; while (<DATA>) { chomp; @matches = (); print /^(?:(?>[^()]+)|$code_re)*$/ ? 'PASS' : (@matches = (), ' + '); print " $_\n"; print " $_\n" foreach @matches; }

It might be an interesting challenge to get the results to print out aligned with where they're captured. But this was the limit of the goofing off I'm going to do for now :).


In reply to Re: Recursive regular expression weirdness by wind
in thread Recursive regular expression weirdness by johngg

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.