princepawn has asked for the wisdom of the Perl Monks concerning the following question:

I was visiting Mark-Jason Dominus' homepage for some tidbits of wisdom, and noticed a regex that should indicate whether a string has all parentheses matched or not:
$_='a(b(c)d(e(f)g)h)i(j(k)l(m)n)o'; $_='a(b(c)d(e(f(g)h)i(j(k)l(m)n)o)'; print 1 while s/[(][^()][)]//;
However, I dont know how to use this code. The second binding $_ should return a 0 and the first should return a 1. the print is an addition by me and I see it's fault: it is saying while this works print a 1, but I dont know a way to say when this doesnt work print something other than a 1 so I know that the string has imbalanced parens somewhere

Replies are listed 'Best First'.
Re: getting a return code from a looping regular expression
by BlaisePascal (Monk) on Aug 01, 2000 at 17:40 UTC
    The regex in the while works by removing all balanced pairs of parentheses, from the inside-out. Your code with the "print" should print 1111111 for both strings. The difference is that at the end, the first string will become 'aio' and the second will become 'aio)'. Something like:
    @foo = ('a(b(c)d(e(f)g)h)i(j(k)l(m)n)o', 'a(b(c)d(e(f)g)h)i(j(k)l(m)n)o'); for (@foo) { print "\'$_\' has "; while s/[(][^()][)]//; # get rid of matched () print "un" if /[()]/; # see if any unmatched print "balanced parentheses" }
    should work.

      while s/[(][^()][)]//; is actually a syntax error. Also, that pattern only matches parenthesis with exactly one character between them.

      Here's some working code for arbitrary parenthesized strings:

      sub balanced { $_ = shift; while (s/[(][^()]*[)]//) {}; return 0 if /[()]/; return 1; }

      Of course a real programmer uses a PDA to balance parenthesis. (;

      sub balanced { my $string = shift; my $stack = 0; for my $index (0..length($string) - 1) { my $char = substr($string, $index, 1); if ($char eq '(') { $stack++; } elsif ($char eq ')') { $stack--; }
                      # Update: Ha!  Thanks, BlaisePascal.
                      # A PDA that happily pops an empty
                      # stack isn't very useful. :)
      return 0 if $stack < 0; } return $stack ? 0 : 1; }

      -Matt

        True, PDA's are the proper tool for balancing parenthesis. But it helps if it works... Your PDA would match "))((" perfectly find. Most wouldn't consider that balanced...
        sub balanced { my $string = shift; my $stack = 0; @string = split //, $string; for (@string) { last if $stack < 0; $stack++ if /[(]/; $stack-- if /[)]/; } return $stack ? 0 : 1; }
        or, without using split and //:
        sub balanced { my $string = shift; my $stack = 0; for my $i (0..length($string)-1) { last if $stack < 0; $stack++ if substr($string,$i,1) eq '('; $stack-- if substr($string,$i,1) eq ')'; } return $stack ?0 : 1; }
      hmmm...code produced a syntax error for me on the while line.
      @foo = ('a(b(c)d(e(f)g)h)i(j(k)l(m)n)o', 'a(b(c)d(e(f)g)h)i(j(k)l(m)n)o)'); for (@foo) { print "\'$_\' has "; while ( s/\([^()]*\)// ){} print "un" if /[()]/; print "balanced parentheses\n"; }
      Just a few minor modifications.

      While you can use an algorithm like this to find if a string has matches parens, you can't write a single regexp that does it. Check out this node for a bit of an explination for this.

      /\/\averick

Re: getting a return code from a looping regular expression
by splinky (Hermit) on Aug 01, 2000 at 23:43 UTC
    Utilizing Perl 5.6 regex features, you can do the following:

    our $parens; my $re = qr{ ( \( (?{$parens++ if $parens >= 0}) | \) (?{$parens-- if $parens >= 0}) | . )* }x; my $this = 'a(bc(de)fg)h'; my $that = '(a(bc(de)fg)h'; my $other = 'a(bc(de)fg)h)'; my $bla = ')a(bc(de)fg)h)'; $parens = 0; print "this, $parens\n" if $this =~ /$re/; $parens = 0; print "that, $parens\n" if $that =~ /$re/; $parens = 0; print "other, $parens\n" if $other =~ /$re/; $parens = 0; print "bla, $parens\n" if $bla =~ /$re/;

    Which prints:

    this, 0 that, 1 other, -1 bla, -1

    Specifically, -1 indicates that, at some point, an attempt was made to close parens that hadn't been opened. A positive number indicates how many unclosed left parens were encountered.

    Thanks, Ilya.

    *Woof*