tshabet has asked for the wisdom of the Perl Monks concerning the following question:

Since the monastary was so helpful with my last question, here's a followup/further question. I was trying to implement the
(??{CODE})
sequence for a matching regexp to match a string with balanced braces using this recursion regexp:
$np = qr{ \{ (?: (?> [^{}]+ ) | (??{ $np }) )* \} }x;
It turned out the problem I had was that I was using an old version of perl that didn't support the delayed interpolation sequence. So no problem, I upgraded and all is fine and dandy etc. But here's my question: Lets say that the sequence didn't exist, or that I'm using perl circa Billy Idol or something. I know that without this newer form of perl recursion a standard regex wouldn't be able to make this type of nested brace match(or so I've been told by coders more knowledgable than I). So my thinking went like this: When I finally decided I needed some monk advice for my previous problem, I had already invested a lot of time and energy into what I had hoped would be a handy and useful program. How could I have at least come close to the function of the newer sequence without using it? By this I mean, if I was unable to use the delayed interpolation sequence and had to replace the regex with some "old fashioned" code, what would my options be? I was very worried about this problem until my salvation yesterday, now its only a "what if" question for people who are in the know. Don't worry, everything is working fine :-) I was just wondering what, say, the author of the recdescent module did to achieve similar effect back in the dark ages.

Replies are listed 'Best First'.
Re: Balancing braces without (??{})
by John M. Dlugosz (Monsignor) on Jul 28, 2001 at 00:30 UTC
    See Parse::YAPP. This implements a full grammar, which is a superset of what can be done with a regex.

    Update: I just ran into Text::Balanced, which I suppose does exactly this. It's part of the Parse::RecDescent package. You should already have it if you've installed Switch.

    A true regular expression is represented by a connected graph of nodes, and you move from state to state with each input token. It can't count, because it has no memory at all other than knowing what state it's in.

    To count braces, you need a "Push-down automata", the next step up. It has a graph and a stack.

    So, you could implement a P.D.A. by starting with foreach $char (split ('',$input)) {, switch to the current state, and push/pop a @array for the state. More specifically, push the open symbol so you can check against it for a matching close.

    if $char is an open symbol, push it along with position. else if $char is a close symbol, pop the stack and verify that it's the matching type (error if not). popped position through current position is ballanced range. Save +that on answer list.
    Error if stack underflows. When out of input, make sure stack is also empty else warning. Return list of ballanced position ranges.

    —John

      Ah, the Push down automata, I knew it would return to haunt me someday. Applied theory with Professor Decker...how I miss it q-; Thanks for this very helpful info John. Actually, my previous solution was on a similar track but instead of implementing a stack to push and pop I was doing something more hackish and just using a variable that would increment for each { and decrement for each } and return the brace that brought the var back to zero. This was a very hackish approach to the problem, and your stack suggestion has obvious advantages aside from its more poetic approach. I'll take a look at Parse::YAPP, sounds like a useful read. Aside from the fact that (??{}) sequence makes the regex code more compact than implementing a stack, are there advantages or disadvantages in terms of runtime, memory, etc? I guess what I'm asking is, if you had perl 5.7 running on your machine and a rainy afternoon to spend writing the best code you could, which technique would you choose? I am very interested in making my code the best it can be, since I hope others will find it useful and want to use it as well. Gotta work on my skills before that hallowed day, though.
        I'd worry about the documentation that states, “WARNING: This extended regular expression feature is considered highly experimental, and may be changed or deleted without notice.”

        So, it might not work at all on Perl 5.8 when it comes out.

        If I had to write a good efficient implementation without using ??{}, I'd probably take what I showed you and do it in chunks rather than one char at a time. That is, chop the string into chunks where each chunk is a single ballancing char, or the whole run between them. Then perform the algorithm on that list, rather than every single char.

        Perhaps there should be a feature that elegantly does P.D.A., not just regexp, without getting into all the complexity of full LALR grammar. Perhaps stacking could be added as regex extenstions without generalized code blocks: a way to push something, and zero-width assertions and subexpressions that can refer to the stack top, and pop.

        —John

        Thinking about it again, I'd make a slight change. Instead of chopping it up first, I'd use a search for brace characters with /g as the condition of a while loop, so the body can examine pos() and push and pop and stuff, and all the uninteresting characters are skipped over.
(tye)Re: Balancing braces without (??{})
by tye (Sage) on Jul 28, 2001 at 01:55 UTC

    I'm fond of:

    sub isBalanced { my $text= shift; my $brace= @_ ? shift : "{}"; my $start= substr( $brace, 0, length($brace)/2 ); my $end= substr( $brace, length($brace)/2 ); 0 while $text =~ s/\Q$start\E[^\Q$brace\E]*\Q$end\E//g; return $text !~ /[\Q$brace\E]/; }
    or even
    sub notBalanced { my $brace= 1 < @_ ? $_[1] : "{}"; my $start= substr( $brace, 0, length($brace)/2 ); my $end= substr( $brace, length($brace)/2 ); my $depth= 0; for( split /[^\Q$brace\E]*/, $_[0] ) { if( index($start,$_) ) { $depth++; } elsif( index($end,$_) ) { return -1 if --$depth < 0; } } return $depth; }
    modifying these to allow doing something with the matches is left as an exercise (as is testing). (:

            - tye (but my friends call me "Tye")
Re: Balancing braces without (??{})
by I0 (Priest) on Jul 28, 2001 at 06:14 UTC
    $_ = "{a}{{b}}{{c}{{d}}}"; ($re=$_)=~s/((\{)|(\})|.)/${['(','']}[!$2]\Q$1\E${[')','']}[!$3]/gs; @$ = (eval{/$re/},$@); print join"\n",@$ unless $$[-1]=~/unmatched/;
    <code>