lebe0024 has asked for the wisdom of the Perl Monks concerning the following question:
I can't figure out a regular expression that will return a parenthesis block in a string, no matter how big/small it is and no matter if it contains other parenthesis blocks.
In otherwords, if I have a string like '( a ( b ( c ) ( d ) e ) )', I want a regular expression to capture the whole thing, not just '( a ( b ( c )' and not just '( c )'. Can you help O wise ones?
Re: capturing matching parenthesis
by davido (Cardinal) on Apr 15, 2004 at 20:23 UTC
|
That's the sort of thing that Text::Balanced is good for. While more recent releases of Perl give the RE engine enough ammunition to accomplish the task, the Text::Balanced module is already a robust implementation designed for your type of problem.
Look, in particular, at the 'extract_bracketed' method.
| [reply] [d/l] |
Re: capturing matching parenthesis
by diotalevi (Canon) on Apr 15, 2004 at 20:21 UTC
|
| [reply] |
Re: capturing matching parenthesis
by Stevie-O (Friar) on Apr 16, 2004 at 05:03 UTC
|
use Regexp::Common qw(balanced);
if ($foo =~ / ( $RE{balanced} ) /x) {
print "grabbed $1";
}
If you look at the documentation, you'll see that it can actually be used very flexibly -- it can handle proper nesting of mixed brackets (e.g. () and {}.
Oh, and beware the lowercase 'b' -- took me a few confused tries to install it from CPAN till I noticed that caveat ;)
--Stevie-O
$"=$,,$_=q>|\p4<6 8p<M/_|<('=>
.q>.<4-KI<l|2$<6%s!<qn#F<>;$,
.=pack'N*',"@{[unpack'C*',$_]
}"for split/</;$_=$,,y[A-Z a-z]
{}cd;print lc
| [reply] [d/l] [select] |
Re: capturing matching parenthesis
by muba (Priest) on Apr 15, 2004 at 21:05 UTC
|
I'd say not to use a regexp. Just walk over the string byte by byte. Keep a $parenthesis variable. ++ it when you encounter a (, -- it when you encounter a ). Stop when $parenthesis has been >0 and now =0 again.
Good luck. | [reply] |
|
Fine, unless you care about the possibility of escaped parens. For example, '( ( \( two ) )' would fail with your method unless you specifically were watching for that sort of thing with additional logic.
| [reply] [d/l] |
|
Hmm, I was indeed not taking care of backwhacks.
But then, the method would not change that much. Also keep track of another variable, let's call it $escape. Set it to 1 if you find a \. Then, in the next iteration of the loop, if that character is a ( or ) and $escape != 0, ignore it. If it is a backslash, ignore it too. Then set $escape back to 0.
| [reply] |
|
|
Re: capturing matching parenthesis
by perlinux (Deacon) on Apr 16, 2004 at 08:18 UTC
|
No regex in my mind :-( I think your string is a kind of
tree with nodes, and your leaves are the most internal letters. It's not impossible a struct C-like and isolate every level of nodes and the leaves at the last level.
( a ( b ( c ) ( d ) e ) )
Graphically:
a
/ \
b e
/ \
c d
Excuse me for my english
| [reply] |
Re: capturing matching parenthesis
by melora (Scribe) on Apr 16, 2004 at 17:59 UTC
|
The times I've had to cope with nested parentheses, I've used recursion -- call the recursion routine when you encounter the open paren (excellent point about the "\(" sequence, by the way), and return when you encounter the close paren (of course, there's the escape sequence there, too), resuming at the point beyond the close paren, in the original string.. I do like the idea of populating the tree structure with it, too, depending on the analysis you need to do on the whole expression. But then, I'm an old C programmer. | [reply] |
|
|