comment on

Your first test (slightly reformatted) had:

  $rxNest = qr{(?x)
    \(
    (?: (?>[^()]+) | (??{$rxNest}) )*
    \)
  };
  $string =~ /$rxNest/;
[download]

Note that you can put regexp flags at the end of a qr() expression just as with a normal regexp, so this is the same:

  $rxNest = qr{
    \(
    (?: (?>[^()]+) | (??{$rxNest}) )*
    \)
  }x;
[download]

The regexp that is being recursively repeated is "find an open/close paren pair with valid nesting of any parens between". Since the match was unanchored, this will locate the first starting point that works; "contains (im(balanced) parens" would therefore match "(balanced)", for example.

The (??{$rxNest}) is called a "deferred eval". When the main /$rxNest/ is compiled, this just appears as a code block in the compiled form - and the compiled form, among other things, needs to know how many capturing parens there are in the pattern. When the deferred eval is invoked the resulting regular expression is independent of the original one from which it was called. That means in particular that the deferred expression has its own capture groups numbering from $1, and these are not available to the parent expression when it returns.

Your attempt to capture the nested strings with a code block was along the right lines, but to cope with backtracking you need to take advantage of the fact that local() will do the right thing. The easy solution is to localise the list:

  (?{ local @memoList = (@memoList, $+) })
[download]

, but more efficient is to localise just one element at a time:

  (?{ local $offset = $offset + 1; local $memoList[$offset] = $+ })
[download]

Going on to the second problem, which was to try and make the match fail if there were imbalanced brackets, I thought the best way would be to add stuff to anchor the match to the beginning and end of the string.

When using recursion, it is vital to understand what is the repeated part of the recursion. If you have anchors in the repeated part, it probably won't do what you want - it is equivalent to a regexp like m{^ text ^ more}x.

So you need to take the anchors out of the repeated part, which is as simple as:

  $string =~ /^$rxNest\z/;
[download]

Not sure if I covered all your points here. As diotalevi says, it would be better shorter - either using shorter examples, or splitting into multiple posts would be better.

Hugo

In reply to Re: Recursive regular expression weirdness by hv
in thread Recursive regular expression weirdness by johngg

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.