Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How do I parse a string containing nested functions with parameters, e.g.
Example 1: A(B(2)) Example 2: A(B(2)+C(D()))
I need to find matching pairs of parenthesis and call a sub that expands each function. My first attempt was to use recursion and a regex with greedy search. This works for example 1, but fails for example 2 because the regex does not find the matching parenthesis:
#!/usr/bin/perl -wT # example 1 my $text = "A(B(2))"; $text = doFunc( "MAIN", $text ); print "Example 1: $text\n"; # example 2 $text = "A(B(2)+C(D()))"; $text = doFunc( "MAIN", $text ); # this fails! print "Example 2: $text\n"; sub doFunc { my( $theFunc, $theParam ) = @_; # FIXME: greedy match fails with: 'TT(TT()+TT(TT()))' $theParam =~ s/([A-Z]+)\((.*)\)/&doFunc($1,$2)/geo; my $result = ""; if( $theFunc eq "MAIN" ) { $result = $theParam; } else { # dummy switch here for demonstration # this would be an elsif for each function $result = "func $theFunc returns <$theParam>"; } return $result; }
Any help is greatly appreciated. Thanks!

Replies are listed 'Best First'.
Re: How to find matching pairs
by ajwans (Scribe) on Mar 12, 2002 at 02:18 UTC
    This kind of thing cannot be specified by regular expressions except in the trivial case where you only have one pair of brackets. In order to properly specify and parse complex expressions you could use Damian Conway's excellent Parse::RecDescent.

    Using this you can specify your input as a context free grammar. Your grammar would probably be something like

    start: expression expression: expression m/[+-/*]/ expression '(' expression ')' function '(' expression ')' identifier number


    There is an excellent FAQ on Parse::RecDescent available on the web.

    1. dude, what does mine say?
    2. "sweet", what about mine?
    3. "dude", what does mine say?
    4. GOTO 2
      You can actually, since Perl's regexes can evaluate code and interpolate patterns at regex run-time. This enables recursive regexes, see the perlre manpage for an example.

      Cheers,
      -Anomo
Re: How to find matching pairs
by rnahi (Curate) on Mar 12, 2002 at 07:31 UTC
    Also Regexp::Common would meet your needs.
    use Regexp::Common; my $parens = qr/$RE{balanced}{-parens=>'()'}/; # ... # my fix $theParam =~ s/([A-Z]+)($parens)/&doFunc($1,$2)/geo; __END__ Using your script, it will print Example 1: func A returns <(func B returns <(2)>)> Example 2: func A returns <(func B returns <(2)>+func C returns <(func + D returns <()>)>)>
Re: How to find matching pairs
by I0 (Priest) on Mar 12, 2002 at 09:47 UTC
    $text = "A(B(2)+C(D())),\nTT(TT()+TT(TT()))"; $text = doFunc( "MAIN", $text ); # print "Example: $text\n"; sub doFunc{ my( $theFunc, $theParam ) = @_; # FIXME: greedy match fails with: 'TT(TT()+TT(TT()))' my $re; {local $^W=0; ($re=$theParam)=~s/((\()|(\))|.)/${[')']}[!$3]\Q$1\E$ +{['(']}[!$2]/gs; $re= join'|',map{quotemeta}eval{$theParam=~/$re/};} die $@ if $@=~/unmatched/; $theParam =~ s/([A-Z]+)\(($re)\)/&doFunc($1,$2)/geo; my $result = ""; if( $theFunc eq "MAIN" ) { $result = $theParam; } else { # dummy switch here for demonstration # this would be an elsif for each function $result = "func $theFunc returns <$theParam>"; } return $result; }
    Update: Why does this dump core when the /o is removed?
Re: How to find matching pairs
by Anonymous Monk on Mar 12, 2002 at 05:38 UTC
    The easiest alternative is probably to install Text::Balanced, distributed with Parse::RecDescent, and use the appropriate function in that module.

    Cheers,
    -Anomo
Re: How to find matching pairs - thanks!
by Anonymous Monk on Mar 12, 2002 at 22:07 UTC
    Thanks for all the good advice, I really appreciate it! I am impressed with the quick replies. Here is another solution I came up with that does not depend on any Perl library. It is not the most elegant, but it works:
    #!/usr/bin/perl -wT $escToken = "\263"; parseText( "Example 1: A(B(2))" ); parseText( "Example 2: A(B(2)+C(D()))" ); sub parseText { my( $theText ) = @_; my $level = 0; print "== $theText\n"; # add nesting level to parenthesis, # e.g. "A(B())" gets "A-esc-1(B-esc-2(-esc-2)-esc-1)" $theText =~ s/([\(\)])/addNestingLevel($1, \$level)/geo; $theText = doFunc( "MAIN", $theText ); # clean up unbalanced mess $theText =~ s/$escToken\-*[0-9]+([\(\)])/$1/go; print "-> $theText\n"; return $theText; } sub addNestingLevel { my( $theParen, $theLevelRef ) = @_; my $result = ""; if( $theParen eq "(" ) { $$theLevelRef++; $result = "$escToken$$theLevelRef$theParen"; } else { $result = "$escToken$$theLevelRef$theParen"; $$theLevelRef--; } return $result; } sub doFunc { my( $theFunc, $theParam ) = @_; $theParam =~ s/([A-Z]+)$escToken([0-9]+)\((.*?)$escToken\2\)/&doFunc +($1,$3)/geo; my $result = ""; if( $theFunc eq "MAIN" ) { $result = $theParam; } else { # dummy switch here for demonstration # this would be an elsif for each function $result = "func $theFunc returns <$theParam>"; } return $result; }
    -- Peter at Thoeny.com - http://TWiki.org/