Kanishka has asked for the wisdom of the Perl Monks concerning the following question:

is there a way to remove paranthesis from a string using only regular expressions (without using if conditions or loops).it should work with any number parenthesis.

eg :
the input : "111(22(33)44)55"
output should be : "11155"

thanks.

Replies are listed 'Best First'.
Re: regular expression paranthesis remover
by davido (Cardinal) on Jun 25, 2004 at 07:55 UTC

    This one's a freebie, thanks to Regexp::Common. You can read about using this module to match balanced parens by seeing the docs for Regexp::Common::balanced.

    use strict; use warnings; use Regexp::Common qw/balanced/; my $string = '111(22(33)44)55'; $string =~ s/$RE{balanced}{-parens=>'()'}//g; print $string, "\n";

    You can thank TheDamian and Abigail-II for all the hard work that went into this module.

    Note: A regexp solution to matching balanced parens is probably less robust than a proper balanced text parser. For that, you could have a look at Text::Balanced.


    Dave

Re: regular expression paranthesis remover
by Zaxo (Archbishop) on Jun 25, 2004 at 08:09 UTC

    If you're unconcerned with their balancing except for the outer ones,

    $_ = '111(22(33)44)55'; my $re = qr/ (.*?) # minimal grab up to the first... \( # literal left paren .* # greedy skip everything up through... \) # the last literal right paren (.*)$ # then grab everything remaining /x; s/$re/$1$2/; print;
    If they do balance, it works, too. What do you want to happen if they don't balance?

    Update: Typo repaired, thanks, Hofmator.

    After Compline,
    Zaxo

      That's needlessly complicated, simply replace the stuff you don't want with nothing (as Anonymous Monk already said): s/\(.*\)//.

      The balancing caveat still applies, of course.

      Update: thospel is completely correct, my solution above (and Zaxo's) doesn't work for multiple parenthesis in the same string. To fix this (and actually make it recognise balanced parens):

      $_ = "111(22(33)44)55"; 1 while s/ \( [^()]* \) //gx; if (/[()]/) { print "unbalanced!!"; } else { print; }
      BTW, I just noticed a small typo in Zaxo's code, it doesn't work as it stands. It should be my $re = qr/.../;.

      -- Hofmator

        Fails on the perfectly balanced "12(34)56(78)9"

        Assuming it's about removing balanced parenthesis, I'd go for:

        my $balance; # Notice declaring $balance beforehand is important, # otherwise you pick up the global inside (??{}) $balance=qr/(?:[^()]|\((??{$balance})\))*/; $string =~ s/\($balance\)//g;
Re: regular expression paranthesis remover
by Enlil (Parson) on Jun 25, 2004 at 07:48 UTC
    Certainly... it goes like so:
    $_ = '1()()()2923()())))(('; s/[()]+//g; print;

    Update: Misread the question and you want to remove using balanced parens: use Regexp::Common::balanced

    update2 And there is the solution based on the balanced parens re in perlre i.e.

    $string='111(22(33))44554'; $re = qr/\((?:(?>[^()+])|(??{$re}))*\)/; $string =~ s/$re//; print $string

    -enlil

Re: regular expression parentheses remover
by Roy Johnson (Monsignor) on Jun 25, 2004 at 14:50 UTC
    Just another way to do it without loops (kind of cheating):
    my $c=0; s/ \( (?{++$c}) | \) (?{--$c}) | ([^()]+) /$1 if ($c==0)/gex;
    If you match an open paren, increment a counter;
    if you match a close paren, decrement it;
    if you match a string of non-parens, capture it;
    substitute whatever was captured back in if the counter is zero.

    There is an "if", but it can be replaced by an "x" operator, if you're a stickler. Or you could rewrite the replacement as $c==0 and $1.

    Another little variation that doesn't rely on the (?{}) construct:

    s{([()]*)([^()]+)} {$c += $1=~y/(// - $1=~y/)//; $2 x !$c }ge;
    Match any parens followed by any non-parens;
    Add the count of opens and subtract the count of closes from your counter;
    Substitute the non-paren portion back in if the counter is zero.

    We're not really tightening our belts, it just feels that way because we're getting fatter.
Re: regular expression paranthesis remover
by hv (Prior) on Jun 25, 2004 at 13:32 UTC

    If you don't expect to be dealing with particularly long and complex strings, it should be sufficient to loop deleting contents from the inner parens outwards:

    1 while $string =~ s/\([^(]*?\)//gs;

    (Note that the simpler pattern s/\(.*\)// will not do the right thing on a string like "a(b)c(d)e", since it will remove the "c" as well. And you can't fix that by making the .* a minimal .*?, since that fails on nested parens.)

    With long strings this approach will start to suffer from the need to repeatedly scan from the beginning of the string for each level of nesting, and then you may be better of with a solution that does a single pass over the string with Regexp::Common::balanced. However the additional complexity of the balanced paren matcher makes it a lot slower, so it won't be a win unless the string is really long or has parens nested quite deeply.

    Hugo

Re: regular expression paranthesis remover
by Anonymous Monk on Jun 25, 2004 at 07:55 UTC