Perl: the Markov chain saw | |
PerlMonks |
Re: Recursive regular expression weirdnessby hv (Prior) |
on Mar 30, 2006 at 01:39 UTC ( [id://540091]=note: print w/replies, xml ) | Need Help?? |
Your first test (slightly reformatted) had:
Note that you can put regexp flags at the end of a qr() expression just as with a normal regexp, so this is the same:
The regexp that is being recursively repeated is "find an open/close paren pair with valid nesting of any parens between". Since the match was unanchored, this will locate the first starting point that works; "contains (im(balanced) parens" would therefore match "(balanced)", for example. The (??{$rxNest}) is called a "deferred eval". When the main /$rxNest/ is compiled, this just appears as a code block in the compiled form - and the compiled form, among other things, needs to know how many capturing parens there are in the pattern. When the deferred eval is invoked the resulting regular expression is independent of the original one from which it was called. That means in particular that the deferred expression has its own capture groups numbering from $1, and these are not available to the parent expression when it returns. Your attempt to capture the nested strings with a code block was along the right lines, but to cope with backtracking you need to take advantage of the fact that local() will do the right thing. The easy solution is to localise the list: , but more efficient is to localise just one element at a time:
Going on to the second problem, which was to try and make the match fail if there were imbalanced brackets, I thought the best way would be to add stuff to anchor the match to the beginning and end of the string. When using recursion, it is vital to understand what is the repeated part of the recursion. If you have anchors in the repeated part, it probably won't do what you want - it is equivalent to a regexp like m{^ text ^ more}x. So you need to take the anchors out of the repeated part, which is as simple as:
Not sure if I covered all your points here. As diotalevi says, it would be better shorter - either using shorter examples, or splitting into multiple posts would be better. Hugo
In Section
Seekers of Perl Wisdom
|
|