comment on

I am performing a substitution on strings that fall between two anchors. A certain substring -- say, "cd" -- may or may not appear as part of the string I'm matching. If it does, I need to capture it.

In my examples below, the anchors are commas, but in reality they are complex regular expressions, so that I can't just use [^,]* to avoid steamrolling over them.

In essence, the substitution will be some variation on

  s/,.*(cd)?.*,/=$1=/
[download]

Making the two .* subexpressions nongreedy while keeping the (cd)? greedy would seem to express exactly what I'm trying to do, except for the slight inconvenience that it doesn't work:

  echo ,abcdefg,abcdefg | perl -pe 's/,.*?(cd)?.*?,/=$1=/'
[download]

I want a captured "cd" between the two equal signs, but the $1 remains empty. The problem seems to be that .*? is apparently not merely nongreedy but also unaccommodating: It won't even consume enough to allow the following greedy subexpression to match. It's not clear to me why a nongreedy expression would consume enough to match a required subexpression (i.e. if I omitted the ? after (cd)), but not an optional but greedy one.

I can make the expression capture the "cd" if I make my first subexpression a little more explicit:

  echo ,abcdefg,abcdefg, | perl -pe 's/,(?:(?!cd).)*(cd)?.*?,/=$1=/'
[download]

This gives me the desired output of "=cd=abcdefg,". But this fails if the part between the anchors does not contains a "cd":

  echo ,abcefg,abcdefg, | perl -pe 's/,(?:(?!cd).)*(cd)?.*?,/=$1=/'
[download]

Here, the desired output is "==abcdefg,", but the greedy subexpression ignores the anchor boundary and goes into the section of the string following it to find a "cd".

I've tried various other things but not yet found something that works. How do I get the $1 to be populated with a "cd" if it appears in the string, and remain empty if it doesn't, while staying between the anchors?

In reply to greedy subexpression between two nongreedy ones by raygun

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.