Allasso has asked for the wisdom of the Perl Monks concerning the following question:

This really has me puzzled. Say I have a files that may have the occurance of "one\ntwo" within:

echo -e 'zero\none\ntwo\n\n three' zero one two three

I want to recurse through the files, and if they contain the block "one\ntwo", I want to put identifiers around it, and a prescribed number of newlines, regardless of how many there were before:

echo -e 'zero\none\ntwo\n\n three' | perl -0777 -pe 's{\s*(one\ntwo +)\s*}{\n\n<begin block>\n\n$1\n\n<end block>\n\n}' zero <begin block> one two <end block> three

Great - but I want to preserve the offset of the first word on the rest of the text that follows. So I think, no problem, I'll just do this:

echo -e 'zero\none\ntwo\n\n three' | perl -0777 -pe 's{\s*(one\ntwo +)\s*?( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}' zero <begin block> one two <end block> three

It preserves the indent, but now I have additional newlines after <end block>. Where did they come from? The second capture is only capturing spaces, so how did two extra \n's get inserted?

Replies are listed 'Best First'.
Re: matching and mysterious captures
by kennethk (Abbot) on May 05, 2011 at 15:59 UTC
    If you modify your code to

     echo -e 'zero\none\ntwo\n\n    three' | perl -0777 -pe 's{\s*(one\ntwo)\s*?( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\nx$2x}'

    so you have a clear delimiter around your second match, your get:

    zero <begin block> one two <end block> xx three

    By swapping your space matching to non-greedy, you are no longer consuming the newlines preceding 'three'. You can get your expected result by using the multiline modifier (see Modifiers in perlre) combined with a line start metacharacter ^;

    echo -e 'zero\none\ntwo\n\n    three' | perl -0777 -pe 's{\s*(one\ntwo)\s*^( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}m'

    yields

    zero <begin block> one two <end block> three
      I see, I was matching the minimum which was 0 \s's and 0 spaces.

      Your example works, except in the case of:
      echo -e 'zero\none\ntwo three' | perl -0777 -pe 's{\s*(one\ntwo)\s* +^( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}m' zero one two three
        What do you expect to get for the case you've listed? This falls outside any of the cases you've described above - see I know what I mean. Why don't you?. If you expect to get
        zero <begin block> one two <end block> three
        which cleans out blank lines but keeps the spaces before three, you can use a non-capturing group (?:...) combined with the ? quantifier to make the whitespace matching conditional:

         echo -e 'zero\none\ntwo    three' | perl -0777 -wpe 's{\s*(one\ntwo)(?:\s*^)?( *)}{\n\n<begin block>\n\n$1\n\n<end block>\n\n$2}m'

Re: matching and mysterious captures
by Utilitarian (Vicar) on May 05, 2011 at 16:02 UTC
    You put them there
    ...<end block>\n\n$2}

    print "Good ",qw(night morning afternoon evening)[(localtime)[2]/6]," fellow monks."