raygun has asked for the wisdom of the Perl Monks concerning the following question:
I am performing a substitution on strings that fall between two anchors. A certain substring -- say, "cd" -- may or may not appear as part of the string I'm matching. If it does, I need to capture it.
In my examples below, the anchors are commas, but in reality they are complex regular expressions, so that I can't just use [^,]* to avoid steamrolling over them.
In essence, the substitution will be some variation on
s/,.*(cd)?.*,/=$1=/
Making the two .* subexpressions nongreedy while keeping the (cd)? greedy would seem to express exactly what I'm trying to do, except for the slight inconvenience that it doesn't work:
echo ,abcdefg,abcdefg | perl -pe 's/,.*?(cd)?.*?,/=$1=/'
I want a captured "cd" between the two equal signs, but the $1 remains empty. The problem seems to be that .*? is apparently not merely nongreedy but also unaccommodating: It won't even consume enough to allow the following greedy subexpression to match. It's not clear to me why a nongreedy expression would consume enough to match a required subexpression (i.e. if I omitted the ? after (cd)), but not an optional but greedy one.
I can make the expression capture the "cd" if I make my first subexpression a little more explicit:
echo ,abcdefg,abcdefg, | perl -pe 's/,(?:(?!cd).)*(cd)?.*?,/=$1=/'
This gives me the desired output of "=cd=abcdefg,". But this fails if the part between the anchors does not contains a "cd":
echo ,abcefg,abcdefg, | perl -pe 's/,(?:(?!cd).)*(cd)?.*?,/=$1=/'
Here, the desired output is "==abcdefg,", but the greedy subexpression ignores the anchor boundary and goes into the section of the string following it to find a "cd".
I've tried various other things but not yet found something that works. How do I get the $1 to be populated with a "cd" if it appears in the string, and remain empty if it doesn't, while staying between the anchors?
|
---|