embirath has asked for the wisdom of the Perl Monks concerning the following question:

Hi everyone
I was wondering if there is an easy way to search a string for matching parentheses, using the m// and s/// methods.

What I mean is the following. Say we have the string that looks like this:
Parameter1(info, blah), Parameter2(new info(DN), blah)...

I want to be able to extract each parameter and its information within the parentheses. So I want my search procedure to ignore the parentheses "(DN)" in teh second parameter. So I want to be able to extract:
"Parameter1(info, blah)", and
"Parameter2(new info(DN), blah)"

as two new strings.

I can probably sit down and write a page long program to do it... but I'm wondering if there is a short-cut available. Some kind of notation within the // that tells it to "find next parentheses, and ITS matching end-parentheses"...

Let me know if you have any ideas!
Thanks!
Emma

Replies are listed 'Best First'.
Re: search for matching parentheses
by davorg (Chancellor) on Oct 20, 2006 at 15:26 UTC

    I think your life would be a lot happier if you abandoned the idea of using regular expressions for this task and looked at Text::Balanced instead.

    --
    <http://dave.org.uk>

    "The first rule of Perl club is you do not talk about Perl club."
    -- Chip Salzenberg

      Wow, thank you ALL for your prompt replies! So many options... :-)
      I decided to check out the module Text::Balanced, since it seems it could be useful to me not just for this problem but in the future.
      But... I'm having some problems with it. I'm wondering why the below code does not work. I'm expecting this piece of code to extract "(x,y,z)", and put ", Param2(1,2,3), Param3(a,b,c)" in the remainder, and "Param1" in the prefix. But it doesn't work. It puts everything in the remainder.
      $string = "Param1(x,y,z), Param2(1,2,3), Param3(a,b,c)"; print "Original string: ", $string, "\n"; ($ext,$rem,$pre) =extract_bracketed($string,'()','.*?'); print "extracted: ", $ext, "\n"; print "remainder: ", $rem, "\n"; print "skip pref: ", $pre, "\n";
      Any help would be greatly appreciated.
      Thanks! Emma

        It seems to work like this:

        use Text::Balanced qw(extract_bracketed); my $string = "Param1(x,y,z), Param2(1,2,3), Param3(a,b,c)"; print "Original string: ", $string, "\n"; my ($ext,$rem,$pre) = extract_bracketed($string, '()', '[^(]*'); print "extracted: $ext\n"; print "remainder: $rem\n"; print "prefix: $pre\n";

        All I've really changed is the prefix regex. I'm now saying that the prefix is one or more non-bracket characters. I'm a little confused tho' - I'd expect that to be the default behaviour.

        --
        <http://dave.org.uk>

        "The first rule of Perl club is you do not talk about Perl club."
        -- Chip Salzenberg

Re: search for matching parentheses
by liverpole (Monsignor) on Oct 20, 2006 at 15:18 UTC
    Hi embirath,

    It would be easy with a regex:

    my $str = "param1(info, blah) param2(new info, more blah)"; while ($str =~ /(\S+\(.*?\))/g) { my $pattern = $1; print "Found pattern '$pattern'\n"; }

    except that when you're adding to the equation the ability to detect nested parenthetical groupings (ie. parentheses within parentheses):

    Parameter2(new info(DN), blah)

    then it makes things a lot more complex.

    So you may have to rethink whether that's part of the requirement or not, and if so, how you want to parse it.


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: search for matching parentheses
by smammy (Novice) on Oct 20, 2006 at 15:43 UTC

    I use this regex to match (but not capture) nested parens. It's from the Camel, 3rd. ed., p. 314.

    my $np; $np = qr{ \( (?: (?> [^()]+ ) | (??{ $np }) )* \) }x;

      I suppose you could use $np like this:

      while (m/\G.*?(\w+)($np)/g) { my $name = $1; # get rid out outermost parens my $value = substr($2,1,length($value)-1); print "name=$name value=$value\n"; }
Re: search for matching parentheses
by wojtyk (Friar) on Oct 20, 2006 at 15:36 UTC
    I ran into this problem awhile ago and wrote some code to handle it. It feels like a kludge, but it works:
    my @args = ( split/,/, $line ); my $size = $#args; # ugly hack to rejoin args that were incorrectly split (intern +al commas) for ( my $i = 0; $i < $size; $i++ ) { my $depth = 0; $depth++ while $args[$i]=~/[(]/g; $depth-- while $args[$i]=~/[)]/g; if ( $depth ) { splice ( @args, $i, 2, $args[$i].",".$args[$i+ +1] ); $i--; $size--; } }
    Basically, it splits the line on commas and then makes sure the number of parenthesis in each splitted item match. If they don't, it splices the next line and recounts. So after you run your string through this code, the args array should be populated with the separated smaller strings.

    Once you have the big string correctly split into the smaller strings, the regex is as simple as this:

    $args[$i] =~ (\w+)\(([^)]*)\) my $param = $1; my $stuffinsideparens = $2;
    If you really wanted to, you could just append this code at the bottom of the for loop.
Re: search for matching parentheses
by johngg (Canon) on Oct 20, 2006 at 16:21 UTC
    This node had some good recipes and explanations for matching nested brackets. My final script which owes all to ikegami's advice follows.

    Output follows

    I hope this is of use.

    Cheers,

    JohnGG

Re: search for matching parentheses
by fenLisesi (Priest) on Oct 20, 2006 at 17:41 UTC
    use strict; use warnings; use Regexp::Common qw(balanced); my $input = q|Param1(info, blah), Param2(new info(DN), blah), camel Param3(), LLama, Param4 (stuff)|; my $PATTERN = $RE{balanced}{-parens=>'()'}; my @pieces = (); INPUT: while (length $input) { $input =~ s{\A [^\w\(]+ }{}x; ## skip leading junk if ($input =~ s{ (\w* $PATTERN) }{}x ) { push @pieces, $1; next INPUT; } last INPUT; } printf "%s\n", join qq(\n), @pieces;
    prints
    Param1(info, blah) Param2(new info(DN), blah) Param3() (stuff)
    cheers
Re: search for matching parentheses
by geekondemand (Scribe) on Oct 21, 2006 at 05:13 UTC
    I think this is from Tom Christiansen's Perl Cookbook, but it's not handy right now... so here goes:
    my $np; $np = qr{ \( (?: (?> [^( )]+ ) # Non-capture group w/o backtracking | (??{ $np }) # Group with matching parens )* \) }x;
    I think you could also look at the Text::Balanced module's extract_bracketed subroutine to see how it's done there.

    Match time pattern interpolation makes Perl's regexes much more powerful than most regex engines... not quite the power of a grammar/parser but still quite powerful, given their compactness. Of course, this is just an intuition on my part... is there a proof (or counterexample) that shows that regexes cannot in general do what a more general parser can do?