matching and mysterious captures

Allasso has asked for the wisdom of the Perl Monks concerning the following question:

This really has me puzzled. Say I have a files that may have the occurance of "one\ntwo" within:

echo -e 'zero\none\ntwo\n\n    three'
zero
one
two

    three
[download]

I want to recurse through the files, and if they contain the block "one\ntwo", I want to put identifiers around it, and a prescribed number of newlines, regardless of how many there were before:

echo -e 'zero\none\ntwo\n\n    three' | perl -0777 -pe 's{\s*(one\ntwo
+)\s*}{\n\n<begin block>\n\n$1\n\n<end block>\n\n}'
zero

<begin block>

one
two

<end block>

three
[download]

Great - but I want to preserve the offset of the first word on the rest of the text that follows. So I think, no problem, I'll just do this:

echo -e 'zero\none\ntwo\n\n    three' | perl -0777 -pe 's{\s*(one\ntwo
+)\s*?( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}'
zero

<begin block>

one
two

<end block>



    three
[download]

It preserves the indent, but now I have additional newlines after <end block>. Where did they come from? The second capture is only capturing spaces, so how did two extra \n's get inserted?

Comment on matching and mysterious captures Select or Download Code

Replies are listed 'Best First'.
Re: matching and mysterious captures by kennethk (Abbot) on May 05, 2011 at 15:59 UTC
If you modify your code to `echo -e 'zero\none\ntwo\n\n three' \| perl -0777 -pe 's{\s(one\ntwo)\s?( )}{\n\n<begin block>\n\n$1\n\n<end block>\n\nx$2x}'` so you have a clear delimiter around your second match, your get: `zero <begin block> one two <end block> xx three` [download] By swapping your space matching to non-greedy, you are no longer consuming the newlines preceding 'three'. You can get your expected result by using the multiline modifier (see Modifiers in perlre) combined with a line start metacharacter `^`; `echo -e 'zero\none\ntwo\n\n three' \| perl -0777 -pe 's{\s(one\ntwo)\s^( )}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}m'` yields `zero <begin block> one two <end block> three` [download]	[reply] [d/l] [select]
Re^2: matching and mysterious captures by Allasso (Monk) on May 05, 2011 at 17:15 UTC
I see, I was matching the minimum which was 0 \s's and 0 spaces. Your example works, except in the case of: `echo -e 'zero\none\ntwo three' \| perl -0777 -pe 's{\s(one\ntwo)\s +^( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}m' zero one two three` [download]	[reply] [d/l]
Re^3: matching and mysterious captures by kennethk (Abbot) on May 05, 2011 at 18:00 UTC
What do you expect to get for the case you've listed? This falls outside any of the cases you've described above - see I know what I mean. Why don't you?. If you expect to get `zero <begin block> one two <end block> three` [download] which cleans out blank lines but keeps the spaces before three, you can use a non-capturing group `(?:...)` combined with the `?` quantifier to make the whitespace matching conditional: `echo -e 'zero\none\ntwo three' \| perl -0777 -wpe 's{\s(one\ntwo)(?:\s^)?( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}m'`	[reply] [d/l] [select]
Re^4: matching and mysterious captures by Allasso (Monk) on May 05, 2011 at 18:51 UTC
Re: matching and mysterious captures by Utilitarian (Vicar) on May 05, 2011 at 16:02 UTC
You put them there ...<end block>\n\n$2} `print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."`	[reply] [d/l]