Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

How do I split on complicated possiblities? For instance, "a,b,op2(c,d),op3(e,op4(f,g))" should be split into
a
b
op2(c,d)
op3(e,op4(f,g))
where c,d,e,f,g may be of the same form op(x,y).

In other words, I only want the split to occur at the highest level of commas and ignore the nested ones.

Originally posted as a Categorized Question.

  • Comment on How do I split a string on highly structured/nested data?

Replies are listed 'Best First'.
Re: How do I split a string on highly structured/nested data?
by lhoward (Vicar) on Jun 07, 2000 at 03:50 UTC
    One approach is to use a parser like Parse::RecDescent. A real parser (as opposed to parsing a string with a regular expression alone) is much more powerful and can be more apropriate for parsing highly structured/nested data like your example.
    use Parse::RecDescent; my $teststr="a,b,op2(c,d),op3(e,op4(f,g))"; my $grammar = q { content: /[^\)\(\,]+/ function: content '(' list ')' value: content item: function | value list: item ',' list | item startrule: list }; my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; defined $parser->startrule($teststr) or print "Bad text!\n";
    For other approaches see the discussion on Balancing Parens.
Re: How do I split a string on highly structured/nested data?
by Anonymous Monk on Aug 17, 2000 at 03:10 UTC
    This should work more reliably in case there are repeated strings:
    $_ = "a,b,op2(c,d),(e),(f),(f,g),op3(e,op4(f,g))\n"; print; ($re=$_)=~s/((\()|(\))|.)/$2\Q$1\E$3/gs; @$ = (eval{/$re/}); die $@ if $@=~/unmatched/; $re = join'|',map{quotemeta}@$; print join"\n",/((?:$re|[^,])+)/g;
Re: How do I split a string on highly structured/nested data?
by merlyn (Sage) on Aug 17, 2000 at 10:09 UTC
    lhoward's grammar seems unnecessarily complicated. Let's simplify it a bit, as well as grabbing what is needed for the answer (the split items):
    use Parse::RecDescent; my $teststr="a,b,op2(c,d),op3(e,op4(f,g))"; my $grammar = q { startrule: list list: <leftop: item ',' item> item: word '(' list ')' <commit> { "$item[1](".join(",",@{$item[3]})." +)" } | word word: /\w+/ }; my $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n"; + defined (my $result = $parser->startrule($teststr)) or print "Bad text +!\n"; print map "<<< $_ >>>\n", @$result;
    Yes, there it is. $result is an array ref of the split-apart items.
Re: How do I split a string on highly structured/nested data?
by Anonymous Monk on Aug 17, 2000 at 10:02 UTC
    (I didn't mean more reliably than lhoward's answer, I meant more reliably than my previous answer, which seems to have been appropriately edited away)

    Originally posted as a Categorized Answer.

Re: How do I split a string on highly structured/nested data?
by Anonymous Monk on Aug 17, 2000 at 02:46 UTC

    Re: How do I split a string on highly structured/nested data?

    Originally posted as a Categorized Answer.