varian has asked for the wisdom of the Perl Monks concerning the following question:

Hi fellow Monks,
Can anyone think of a regex to parse a string that contains items with arbitrary chars grouped together in parenthesis into a list. It should only parse the 'outermost' parenthesis though.
E.g. $source = '(aaa) (bbb (ccc( ddd) eee) (fff)' using: (@destination) = ($source =~ / ... / ); would need to result in: $destination[0] = 'aaa' $destination[1] = 'bbb (ccc( ddd) eee' $destination[2] = 'fff'
Clearly this could be solved by using loops, functions etc yet surely there must be a way to achieve this with a single regex?

I played a bit with forward/backward anchoring etc yet the possible (non-balanced) nesting of parenthesis makes it an intriging challenge.

Replies are listed 'Best First'.
Re: regex to parse (nested) parenthesis delimited string?
by davorg (Chancellor) on Feb 06, 2007 at 16:44 UTC
Re: regex to parse (nested) parenthesis delimited string?
by imp (Priest) on Feb 06, 2007 at 16:45 UTC
    Text::Balanced handles this nicely.
    use strict; use warnings; use Text::Balanced qw(extract_bracketed); my $text = '(aaa) (bbb (ccc( ddd) eee) (fff)' ; my @tokens = extract_bracketed($text, '()'); print "$_\n" for @tokens; # Output: #(aaa) # (bbb (ccc( ddd) eee) (fff)
      Thanks all for the feedback.

      Indeed Text::Balanced might be a better solution although the suggested code fails to recognize '(fff)' as a separate, third, item. (probably because the parenthesis have an unbalanced nested parenthesis)

        Garbage in.. :)
        probably because the parenthesis have an unbalanced nested parenthesis
        Having unbalanced nested parenthesis invalidates your question. You asked for the following:
        input: '(aaa) (bbb (ccc( ddd) eee) (fff)' output: '(aaa)' '(bbb (ccc( ddd) eee)' '(fff)'
        but the unbalancing could have been resolved as in imp's answer, or alternatively as follows:
        input: '(aaa) (bbb (ccc( ddd) eee) (fff)' output: '(aaa)' '(bbb' '(ccc( ddd) eee)' '(fff)'
        As a matter of fact, I would understand either taking them all like imp (i.e. assuming that the missing parenthesis are all in the end) or this last one (i.e. assuming that the missing parentheses close the littlest possible group), while I don't see a generalisation rule for your choice.

        Flavio
        perl -ple'$_=reverse' <<<ti.xittelop@oivalf

        Don't fool yourself.
Re: regex to parse (nested) parenthesis delimited string?
by kyle (Abbot) on Feb 06, 2007 at 16:45 UTC
Re: regex to parse (nested) parenthesis delimited string?
by fenLisesi (Priest) on Feb 06, 2007 at 18:11 UTC
    You could use a canned /regexp?/i as follows:
    use strict; use warnings; use Regexp::Common qw(balanced); my @streams = ( '(aaa) (bbb (ccc( ddd) eee)) (fff)', ' (a) ((bc (d)) ef) h', '(a)', '(a) ', '(a) (bc)', '', ); my $PATTERN = $RE{balanced}{-keep}; for my $input (@streams) { print qq("$input" => ); my @pieces = ($input =~ /$PATTERN/g); printf qq(%s\n), join qq( ), map {qq("$_")} @pieces; }
    which prints:
    "(aaa) (bbb (ccc( ddd) eee)) (fff)" => "(aaa)" "(bbb (ccc( ddd) eee))" + "(fff)" " (a) ((bc (d)) ef) h" => "(a)" "((bc (d)) ef)" "(a)" => "(a)" "(a) " => "(a)" "(a) (bc)" => "(a)" "(bc)" "" =>
    This assumes that you had a typo in your post and you actually want balanced parens. Cheers.
      Thanks for the pointer, very useful module indeed and it provides what I needed!