We all know by now that regexes have problems with arbitrarily delimited data. This little snippet demonstrates a small subroutine that determines whether or not your delimiters are balanced. It also allows you specify an escape character, if needed.

Just pass the sub your string, left delimiter, right delimeter, and an optional escape character.

As a side note, any regexophiles want to tell me how I can get that frickin' dot star out of there? I worked on it for hours, but that was the easiest solution :(

I remember seeing an earlier version of this in the form of while ( $str =~ s/\(([^)]+)\)/$1/g ){};It didn't allow for escaped characters and I can't remember where I found it. If any monks know the origin, I'd like to be able to credit the person who had the initial idea.

#!/usr/bin/perl -w use strict; my @strings = ( '(a(a(b)a)a)', # balanced '(a(a(b)a)a)a)', # too many right parens '(a(a(b)a\)a)a)', # balanced -- one right paren i +s escaped '(a(a\(a(a(b)a)a)', # unbalanced -- still have too +many left parens '(a(a\(a(b)a)a)' ); # balanced -- one left paren is + escaped foreach my $string ( @strings ) { my $balanced = balanced_delimiters( $string, "(", ")", "\\" ); print "Delimiters in $string are "; if ( $balanced ){ print "balanced.\n"; } else { print "not balanced.\n"; } } sub balanced_delimiters { # $escape is optional. If not supplied, no escape character will # be recognized. my $str = shift; my $left = quotemeta shift; my $right = quotemeta shift; my $escape = quotemeta shift; my $unescapedLeft = "(?<!$escape)$left"; my $unescapedRight = "(?<!$escape)$right"; my $middle = "."; # AAARRRRGGGGHHHH!!!! my $regex = "$unescapedLeft($middle*?)$unescapedRight"; while ($str =~ s/$regex/$1/gs){}; return $str =~ /$unescapedLeft|$unescapedRight/ ? 0 : 1; }

Replies are listed 'Best First'.
Re (tilly) 1: Determining if you have balanced delimiters
by tilly (Archbishop) on Nov 21, 2000 at 18:07 UTC
    It is very easy to remove the need for the .* - just don't use an RE solution. :-)

    Though I find this cute, using an RE for this problem is like using a hammer as a screwdriver. It kinda works. But when you need to solve a slightly more delicate problem, it doesn't really. In this instance suppose we have several types of matched delimiters, (), [] and {}. Now go forth and check whether a given body of text balances! The RE fails horribly.

    This is not a failing of REs, it is a matter of them being used outside of their area of competency.

    REs are designed to state patterns that you can search a string for. They are bits and pieces that can be recognized and are really handy for that. However people like them so much that they try to use it to demonstrate that a document fits some format. This is a different kind of problem. What you want to do now is start thinking of this as a parsing problem, looking for tokens, etc.

    As soon as you make that shift, the harder version of this problem becomes trivial. Maintain a stack of open tokens, when you come to closing ones pull them off and see if they match. If not then you have a problem. When you come to the end if any are left open you have another problem. This is conceptually easy and readily extensible.

Re: Determining if you have balanced delimiters
by I0 (Priest) on Dec 05, 2000 at 11:00 UTC
    sub balanced_delimiters { local $_ = shift; my %sub; @sub{@_} = ("(",")","\\"); my $LRS=join'|',map quotemeta,@_; {local $^W=0;s/($LRS|.)/$sub{$1}/gs; eval{m/$_/} } return !$@; }
      Darned clever. Too bad that:
      balanced_delimiters('(\a)', '(', ')', "\\")
      returns the wrong thing. Perhaps you meant this?
      sub balanced_delimiters { local $_ = shift; my %sub; @sub{@_} = ("(",")"); my $LRS=join'|',map quotemeta,@_; $LRS .= "($LRS)"; {local $^W=0;s/($LRS|.)/$sub{$1}/gs; eval{m/$_/} } return !$@; }
      A behaviour note, the escape sequence properly escapes the escape sequence.
        Thank you for catching that, it was meant more like
        @sub{@_} = ("(",")"); my $LRS=join'|',map quotemeta,@_; {local $^W=0;s/($LRS.|.)/$sub{$1}/gs; eval{m/$_/}
Re: Determining if you have balanced delimiters
by merlyn (Sage) on Nov 21, 2000 at 17:52 UTC
    Your ARRRGH is correct. I believe your code will incorrectly pass the string "(()" because the second "(" qualifies for "." in the middle.

    What you need is a dot that doesn't match any of your other items of interest. Perhaps (untested):

    $middle = "(?:(?!$left|$right|$escape$left|$escape$right).)";

    -- Randal L. Schwartz, Perl hacker

      No, it works. It was supposed to do that. :-)

      The idea is that it repeatedly matches a pair and removes it while it can. When it is done, it then checks if any were left over.

        Well, in that case, it's another problem, since the dot can match the escape character, so it'll incorrectly blast "(\)".

        I knew it was something. {grin}

        -- Randal L. Schwartz, Perl hacker

Re: Determining if you have balanced delimiters
by metaperl (Curate) on Nov 21, 2000 at 19:07 UTC
    I assume this code is an exercise of your regexp muscles (no doubt a worthy thing to do), because we already have Text::Balanced
(Ovid) Re: Determining if you have balanced delimiters
by Ovid (Cardinal) on Nov 21, 2000 at 20:22 UTC
    This code wasn't really intended to be a serious method of dealing with this problem since it is innapropriate here. For many situations, my beloved regexes can't match the power of Text::Balanced or other solutions, but for some sick reason, I view them as a fun logic puzzle :)

    In retrospect, I should have just put this in seekers and just asked monks how they would approach the . problem. The above comments about the limitation of this approach were correct. I just want to figure out how to get rid of that frickin' dot! merlyn's proposal (which he acknowledged was untested) didn't pan out:

    $middle = "(?:(?!$left|$right|$escape$left|$escape$right).)";

    Cheers,
    Ovid

    Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Re: Determining if you have balanced delimiters
by agoth (Chaplain) on Nov 21, 2000 at 14:43 UTC
    What a fantastic bit of code, thanks for that!!!! ++

Re: Determining if you have balanced delimiters
by agoth (Chaplain) on Dec 14, 2000 at 19:38 UTC
    A bit late in the day, but delimiters in #29, if you havent seen it: One liner