Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Re: Recursive regular expression weirdness

by hv (Prior)
on Mar 30, 2006 at 01:39 UTC ( [id://540091]=note: print w/replies, xml ) Need Help??


in reply to Recursive regular expression weirdness

Your first test (slightly reformatted) had:

$rxNest = qr{(?x) \( (?: (?>[^()]+) | (??{$rxNest}) )* \) }; $string =~ /$rxNest/;

Note that you can put regexp flags at the end of a qr() expression just as with a normal regexp, so this is the same:

$rxNest = qr{ \( (?: (?>[^()]+) | (??{$rxNest}) )* \) }x;

The regexp that is being recursively repeated is "find an open/close paren pair with valid nesting of any parens between". Since the match was unanchored, this will locate the first starting point that works; "contains (im(balanced) parens" would therefore match "(balanced)", for example.

The (??{$rxNest}) is called a "deferred eval". When the main /$rxNest/ is compiled, this just appears as a code block in the compiled form - and the compiled form, among other things, needs to know how many capturing parens there are in the pattern. When the deferred eval is invoked the resulting regular expression is independent of the original one from which it was called. That means in particular that the deferred expression has its own capture groups numbering from $1, and these are not available to the parent expression when it returns.

Your attempt to capture the nested strings with a code block was along the right lines, but to cope with backtracking you need to take advantage of the fact that local() will do the right thing. The easy solution is to localise the list:

(?{ local @memoList = (@memoList, $+) })
, but more efficient is to localise just one element at a time:
(?{ local $offset = $offset + 1; local $memoList[$offset] = $+ })

Going on to the second problem, which was to try and make the match fail if there were imbalanced brackets, I thought the best way would be to add stuff to anchor the match to the beginning and end of the string.

When using recursion, it is vital to understand what is the repeated part of the recursion. If you have anchors in the repeated part, it probably won't do what you want - it is equivalent to a regexp like m{^ text ^ more}x.

So you need to take the anchors out of the repeated part, which is as simple as:

$string =~ /^$rxNest\z/;

Not sure if I covered all your points here. As diotalevi says, it would be better shorter - either using shorter examples, or splitting into multiple posts would be better.

Hugo

Replies are listed 'Best First'.
Re^2: Recursive regular expression weirdness
by johngg (Canon) on Mar 30, 2006 at 11:14 UTC
    Thank you again for very clear explanations and suggestions which I will take away and try out. With luck this will not result in another post by way of a cry for help.

    I take your and diotalevi's point about the length of the post. As I was typing I kept thinking "this is getting too long." I should have split it in two. But I am also aware of the number of posts where not enough information is given and Monks are left guessing what the problem might be.

    Anyway, thanks again,

    JohnGG

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://540091]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2024-04-20 00:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found