karthikpa has asked for the wisdom of the Perl Monks concerning the following question:

I have a string something like this
$abc = abc,xyz,{1,2,3,4},18-90-89,{{1,2},{5,6,7,8}},yts
I want to split this into tokens stored in an array(say @elems) such as
$elems[0] = abc $elems[1] = xyz $elems[2] = {1,2,3,4} $elems[3] = 18-90-89 $elems[4] = {{1,2},{5,6,7,8}} and so on.
The rules are
1."Anything inside a parenthesis is a separate token even though it might contain the general delimiter comma (,)"
2."Nesting of parenthesis is possible"
3."no assumption can be made about the data i.e. it is a general mix of alphanumeric characters"

A friend of mine told that this is beyond the scope of regexp and this is only possible using Parse::RecDescent module as this problem is similar to token parsing by the compiler.

Dear monks, please help me write perl code to achieve this effect either through a regex or by use of any module such as Parse::RecDescent.

Thanks in advance!
Karthik

Edited by Chady -- added formatting

Replies are listed 'Best First'.
Re: urgent perl regexp help needed
by Sec (Monk) on Jul 20, 2005 at 15:13 UTC
    With the help of "man perlre" it is easy. Just steal the "match parenthesized group" snippet, and use it as an alternative in the match you want.
    $abc = "abc,xyz,{1,2,3,4},18-90-89,{{1,2},{5,6,7,8}},yts"; $par=qr! \{ (?: [^{}]+ | (??{ $par }) )+ \} !x; @a= $abc=~ /([^{,]+|$par)/g; print join("\n",@a);

      I'd use * instead of +, since he didn't say empty fields were not allowed. And what if the bracket are is not the first character? Fixed and documented:

      $_ = "abc,xyz,{1,2,3,4},18-90-89,{{1,2},{5,6,7,8}},yts"; # Create a regexp to match nested brackets. my $par; # Can't combine this with next line. $par = qr/ \{ # Opening bracket. (?: # Match zero or more [^{}]+ # non-brackets | # or (??{ $par }) # bracketed content. )* \} # Closing bracket. /x; # Extract fields from the line. my @elems = / \G # Start where last match left off. ( # Capture (return) (?: # zero or more [^{,]+ # non-brackets, non-commas | # or $par # bracketed content. )* ) (?: , | $) # Match comma or end of line. /xg; print(join("\n", @elems));

        thanks a lot!...that was really useful and informative

        Edited by Chady -- added formatting

      Thanks a lot Sec...it worked. Could you please explain to me what exactly you are doing here...I tried reading man pages but there is not much of detail there.

      the help is really appreciated.

      Thanks
      Karthik

      Edited by Chady -- added formatting