Parsing nested parentheses

JeffR100 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing nested parentheses by Abigail-II (Bishop) on Nov 18, 2003 at 19:55 UTC
Once you know how to write a regex to match a string with balanced parens, it's not to hard to write a regex to do what you want. How does a string with balanced parens look like? It starts with an opening paren, then "junk" separated by strings with balanced parens, where "junk" is a string consisting of something that isn't a paren. Quite like a delimited string with escaped delimiters actually. Now, how do we go from string with balanced parens to the requested string? Simple. We start with an opening paren, then junk string separated by strings with balanced parens, then "C", then junk strings separated by strings with balanced parens, and finally a closing paren. So much for text. Here's the code: #!/usr/bin/perl use strict; use warnings; no warnings qw /syntax/; my $bal; $bal = qr /[(] [^()]* (?:(??{ $bal }) [^()]* )* [)]/x; my $re = qr /[(] [^()]* (?:(??{ $bal }) [^()]* )* C [^()]* (?:(??{ $bal }) [^()]* )* [)]/x; while (<DATA>) { chomp; print "$_: "; print $& if /$re/; print "\n"; } __DATA__ ((A, B), C, (D, E)) (((A, B), C, (D, E))) ((A, B), D, (C, E)) ((A, C), D, (B, E)) (((A, C)), D, (B, E)) ((A, (C)), D, (B, E)) A, B, C, D, E ((A, B), F, (D, E)) ((A, B), C, (D, E)): ((A, B), C, (D, E)) (((A, B), C, (D, E))): ((A, B), C, (D, E)) ((A, B), D, (C, E)): (C, E) ((A, C), D, (B, E)): (A, C) (((A, C)), D, (B, E)): (A, C) ((A, (C)), D, (B, E)): (C) A, B, C, D, E: ((A, B), F, (D, E)): [download] (I cooked up this regex while I was walking home from the train station. It turned out to work at the first try). Abigail	[reply] [d/l]
Re: Re: Parsing nested parentheses by husker (Chaplain) on Nov 18, 2003 at 22:17 UTC
(I cooked up this regex while I was walking home from the train station. It turned out to work at the first try). Amazing. The most constructive thing I think about on the way home from the train station is if I will be able to get my "welcome home" beer open on the first try.	[reply]
Re: Parsing nested parentheses by hardburn (Abbot) on Nov 18, 2003 at 17:20 UTC
AFAIK, CPAN modules for matching balanced text are only going to return what is in the parens, not filter out information inside them. You would use the module to grab the text, then parse it yourself. A few modules I can think of are Text::Balanced and Regexp::Common::balanced. ---- I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident. -- Schemer `: () { :\|:& };:` Note: All code is untested, unless otherwise stated	[reply] [d/l]
Re: Parsing nested parentheses by diotalevi (Canon) on Nov 18, 2003 at 17:17 UTC
Simple? Perhaps. For the moment have a gander at Maximal match in a recursive regex which is the bare minimum required if your going to merely capture maximum. My first thought is to follow the $re check with a code block that can check the just-captured result for the contents of your search. `for ( A .. E ) { print "$_: "; print '((A, B), C, (D, E))' =~ match( $_ ); # Returns '(D, E)' print "\n"; } sub match { my $search = shift; my $re; eval qq[ \$re = qr/ \$ # Opening parenthese ((?: # Capture the contents [^()]+ # Body of a link \| (??{\$re}) # Or recurse )+) # and allow repeats internally \$ # Closing parenthese # Test that the just-closed capture block contains $search +. (?(?{ -1 != index \$^N, \$search })(?=)\|(?!)) /x; ] or die $@; return $re; }` [download]	[reply] [d/l]
Re: Re: Parsing nested parentheses by Anonymous Monk on Nov 18, 2003 at 17:57 UTC
That doesn't give any result when 'C' is the argument.	[reply]
Re: Re: Re: Parsing nested parentheses by diotalevi (Canon) on Nov 18, 2003 at 18:15 UTC
Oh too bad. It "should". There's also some funny action with stuff being cached. For kicks, try a `for ( A .. E ) { ... =~ match( $_ ) }` and you'll notice that the expression never shifts off of its initial search.	[reply] [d/l]
Re: Parsing nested parentheses by TheDamian (Vicar) on Nov 18, 2003 at 20:15 UTC
This is reasonably straightforward with Regexp::Common, though you do have to "roll-your-own" a little: `use Regexp::Common; my $balanced = qr/[^()]+\|$RE{balanced}{-parens=>'()'}/; sub extract { my ($want, $from) = @_; my $nested = qr/$balanced* \Q$want\E $balanced*/x; $from =~ m/( $ $nested $ \| \Q$want\E )/x; return $1; } my $expr = '(((B,G),A,(A,B)),C,(D,(E))),F'; for ('A'..'H') { print "$_: ", extract($_,$expr),"\n"; }` [download] Note that I've assumed that, in the edge case where the target is unnested (e.g. 'F' in the above example), the extraction should just return target itself. Also note the standard regexish "left-most-longest-match" behaviour when the target appears more than once (as do targets 'A' and 'B' in the example).	[reply] [d/l]
Re: Re: Parsing nested parentheses by JeffR100 (Acolyte) on Nov 18, 2003 at 22:57 UTC
I tried this out, but it give a segmentation fault on the line: $from =~ m/( $ $nested $ \| \Q$want\E )/x; Any ideas?	[reply]
Re: Re: Re: Parsing nested parentheses by TheDamian (Vicar) on Nov 18, 2003 at 23:42 UTC
Any ideas? Not without a little more information. What version of perl are you using? What version of Regexp::Common?	[reply]
Re: Parsing nested parentheses by fletcher_the_dog (Friar) on Nov 18, 2003 at 19:46 UTC
Here is a simple way to do it by keeping track of how deep you are into parenthesis' and what depth your match occured at: use strict; my $str = "((A,B),C,(D,E))"; foreach my $m (qw(A B C D E)) { print $m." => ".FindMinParens($m,$str)."\n"; } sub FindMinParens{ my $match = shift; my $string = shift; my @start_pos; my $depth = -1; # keep matching parens or $match while ($string=~/([()]\|$match)/g) { if ($1 eq "(") { # record opening paren positions push @start_pos,pos($string); } elsif ($1 eq ")") { # if we reached the closing parens for the minimum pair # get the sub string and exit if ($#start_pos == $depth) { my $start = $start_pos[-1]; my $len = pos($string) - $start -1; return substr($string,$start,$len); }else{ pop @start_pos; } } else { # store depth of $match $depth = $#start_pos; } } return ""; } __OUTPUT__ A => A,B B => A,B C => (A,B),C,(D,E) D => D,E E => D,E [download]	[reply] [d/l]
Re: Parsing nested parentheses by shotgunefx (Parson) on Nov 18, 2003 at 18:29 UTC
This is probably a klunky solution, but it reminded me of a parser I had written.. (Evaluate Expressions.) Quickly hacking at it and only testing a few simple cases (you've been warned) You could use the returned structure to search for whatever criteria you desire. I'm sure someone here has a much more elegant solution, but my regex abilities are rather limited... #!/usr/bin/perl use warnings; use strict; use Data::Dumper; my $input = '(a,b,(c,d,(e,f,g)))'; print Dumper parse_expression($input); sub parse_expression { my $exp = shift; my @tokens = (); $exp=~s/\s([()])\s/ $1 /go; # Get tokens push @tokens, $1 while $exp=~/\G\s(".?")/gc or $exp=~/\G\s('.? +')/gc or $exp=~/\G\s(\S+)/gc; # Find any parens. my (@lp,@rp) = (); for (my $p =0; $p < @tokens; $p++){ if ($tokens[$p] eq '('){ push @lp,$p; }elsif($tokens[$p] eq ')'){ push @rp,$p; } } if ( @lp != @rp){ warn "Mismatched parens in expression.\n"; return; } my @temp = @tokens; for (my $i=0; $i < @rp; $i++){ my @wanted; for (my $j = $#lp; $j >= 0 ; $j--){ if ( defined $lp[$j] && $lp[$j] < $rp[$i] ){ (undef,@wanted) = @tokens[ $lp[$j] .. ($rp[$i] - 1 ) +] ; @tokens[ $lp[$j] .. ($rp[$i]) ] = [ grep {defined $_ +} @wanted]; push @temp, map {split /\s,\s*/} @wanted; $lp[$j] = $rp[$i] = undef; last; } } } return $tokens[0]; } __DATA__ # OUTPUT $VAR1 = [ 'a,b,', [ 'c,d,', [ 'e,f,g' ] ] ]; [download] -Lee "To be civilized is to deny one's nature."	[reply] [d/l]
Re^2: Parsing nested parentheses by bassfox (Initiate) on Jun 19, 2008 at 08:20 UTC
Hi there, ever thought of using the perl internal functions to import a nested data structure to a LoL? If you do not have to avoid using eval, you could do it real simple like this: `#!/usr/bin/perl -w use strict; use Data::Dumper; my $string = "(a,b,(c,d,(e,f,g)))"; $string =~ s/([^,]+)/"$1"/gsm; # quote content $string =~ tr//\[\]/; # replace braces my @LoL; # define list eval("\@LoL = $test;"); # fill LoL print Dumper @LoL; __DATA__ Output: $VAR1 = [ 'a', 'b', [ 'c', 'd', [ 'e', 'f', 'g' ] ] ];` [download] It is not extensively tested, but should work for most items. - frank	[reply] [d/l]
Re: Parsing nested parentheses by Anonymous Monk on Nov 18, 2003 at 17:55 UTC
One way using regular expressions: `my $expr = '((A,B),C,(D,E))'; print extract('C',$expr),"\n"; print extract('B',$expr),"\n"; print extract('D',$expr),"\n"; sub extract { my $char = shift; my ($str,$dup) = (shift) x 2; 1 while $dup =~ s/$[^()$char]$/'.' x length $&/e; $dup =~ m/($[^()$char]$char[^()]*$)/; return substr($str,$-[1],length $1); } __END__ # output: ((A,B),C,(D,E)) (A,B) (D,E)` [download] But you probably want to build a small expression parser.	[reply] [d/l]
Re: Re: Parsing nested parentheses by melora (Scribe) on Nov 18, 2003 at 18:28 UTC
Being an ancient C programmer, I would probably recurse through the string, inspecting a character at a time. A "(" would mean a call to myself with the remainder of the string. Other characters (except for ")") I would load up into a string, which would be all the stuff within a (). No, better, load them into separate strings, starting a new string after each comma. A ")" would mean returning, after inspecting the above string(s) for a match. If you find a match, I guess you print out that list, comma delimited just like it was in the original string? Or you could just print the substring (of the original string) just inspected. And yes, you could use a regex for the comparison, but you could just use "eq" to compare the strings, since the commas and parens would already be gone. To a true Perl programmer, this is probably brute force and ignorance, but I've used this approach before and it can work.	[reply]
Re: Parsing nested parentheses by welchavw (Pilgrim) on Nov 19, 2003 at 17:53 UTC
This will not work if all tokens are not the same length, because of the reliance on `index`, which seems ok given the sample dataset provided. Also, its probably not the best approach, but I believe it works and I had fun producing it. use strict; use warnings; my $val = 'X'; my $min_len_subgroup; sub chk { my $subgroup = shift; if(-1 != index($subgroup, $val)) { if (!defined $min_len_subgroup \|\| length $subgroup < length $min_l +en_subgroup) { $min_len_subgroup = $subgroup; } } } our $re = qr{ ( $ (?: (?> [^()]+ ) \| (??{ $re }) )* $ ) (?{ chk($1) }) }x; my @pats = ( "(1(2(X)))", "(1(2(X,a))) (3(X,e,f))", "(1(2(X,b,c))) (3(X,f))", "(1(2(X,a))) (X,b)", "(1(2(X,a))) (X,b) X", "(1(2(X,a))) (X,b) (X)", ); for my $pat (@pats) { undef $min_len_subgroup; if (() = ($pat =~ /$re/g)) { print "$min_len_subgroup\n"; } } [download] ,welchavw	[reply] [d/l] [select]
Re: Parsing nested parentheses by Anonymous Monk on Nov 21, 2003 at 07:21 UTC
`my $expr = '((A,B),C,(D,E))'; print extract('C',$expr),"\n"; print extract('B',$expr),"\n"; print extract('D',$expr),"\n"; sub extract { my $char = shift; local $_ = shift; (my $re=$_)=~s/(($)\|($)\|.)/$2\Q$1\E$3/gs; return (sort{length $a <=> length $b}(grep/$char/,eval{/$re/},$@ & +& die $@))[0]; }` [download]	[reply] [d/l]


Welcome to the Monastery
	PerlMonks