tel2 has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I understand that the regular expression /m modifier is supposed to let ^ and $ match next to embedded \n.

If that is the case, then why don't these match?

perl -e '$_="ABC\nDEF";s/C$^D/-/m;print' perl -e '$_="ABC\nDEF";s/C$\n^D/-/m;print'
Instead, they give this output:
ABC DEF
I know I can make it match like this:
perl -e '$_="ABC\nDEF";s/C\nD/-/;print'
which gives this output:
ABC-DEF
And I didn't even need the /m modifier, but that doesn't answer my question.

Thanks.

Replies are listed 'Best First'.
Re: Why \n matches but not $^? (weight)
by tye (Sage) on Oct 13, 2008 at 04:41 UTC

    You've tripped over the most-voodoo of Perl's parsing. $\ is a variable and the regex parser needs to decide whether $\n meant "end-of-line then newline" or "the contents of $\ then the letter 'n'". Perl prefers the latter interpretation. Several ways to try to resolve this don't work:

    perl -le'$_="ABC\nDEF"; s/C\$\n^D/-/m; print' perl -le'$_="ABC\nDEF"; s/C$[\n]^D/-/m; print'

    But there are several ways to successfully work around the problem:

    perl -le'$_="ABC\nDEF"; s/C$ \n^D/-/mx; print' perl -le'$_="ABC\nDEF"; s/C(?:$)\n^D/-/m; print' perl -le'$_="ABC\nDEF"; s/C$(?:\n)^D/-/m; print'

    But they produce "AB-EF" not "ABC-DEF". And the following one will never match anything if it were parsed the way you expected, since there has to be a \n in the matched string between $ and ^:

    perl -le'$_="ABC\nDEF"; s/C$^D/-/m; print'

    - tye        

      If you suspect that a regex isn't parsed the way you wanted it, you can use re 'debug'; to find out:
      $ perl -Mre=debug -wle 'm/C$\n^D/' Freeing REx: `","' Omitting $` $& $' support. EXECUTING... Compiling REx `C n^D' size 6 Got 52 bytes for offset annotations. first at 1 rarest char D at 3 1: EXACT <C\nn>(3) 3: BOL(4) 4: EXACT <D>(6) 6: END(0) anchored "C ...
      You don't have to understand everything to notice that the EXACT <C\nn> isn't what you were after. The literal <c>n has to come from the ...\n in the regex, so the thing before it (a newline) has to come from the previous token in the regex.

        Thanks. That reminded of one thing I had been trying to remember to include:

        $ perl -e'print qr/C$\n^D/,$/' (?-xism:Cn^D) $ perl -le'print qr/C$\n^D/' (?-xism:C n^D)

        which is a lower-tech way to notice that your regex wasn't parsed the way you expected. (It also demonstrates why I use -l: so I don't have a append newlines to each of my print statements.)

        - tye        

        Thanks moritz,

        That's very helpful. I didn't know about that re debug module.

        BTW: Why are you and tye using Perl's -l switch?

        Terry

      Thanks heaps, Tye!

      A prompt, accurate and exhaustive response. (Well, I'm almost exhausted from just reading it). Nice work!

      Yes - I meant "AB-EF" when I wrote "ABC-DEF". Sorry - typo.

      PS: I assume the -l switch is superfluous in your answers?
      PPS: Any ideas why those 1st 2 1-liners you gave don't work? They look OK from this angle.

      Thanks again.

        I assume the -l switch is superfluous in your answers?

        You assume wrongly. perlrun

        Any ideas why those 1st 2 1-liners you gave don't work? They look OK from this angle.

        Sure. Why don't you practice the methods moritz and I offered for helping to figure out how a regex was parsed.

        - tye        

      If 'm' as flag causes ^ and $ to gain magic as in

      m Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.

      and $ matches end-of-line (or before newline) as in

                 $   Match the end of the line (or before newline at the end)

      It might be arguable that the newline would be matched by $ in

      $ perl -Mre=debug -wle'"ABC\nDEF" =~ m/C(?:$)^D/m' Freeing REx: `","' Compiling REx `C(?:$)^D' size 7 Got 60 bytes for offset annotations. first at 1 rarest char D at 1 1: EXACT <C>(3) 3: MEOL(4) 4: MBOL(5) 5: EXACT <D>(7) 7: END(0) anchored "CD" at 0 (checking anchored) minlen 2 Offsets: [7] 1[1] 0[0] 5[1] 7[1] 8[1] 0[0] 9[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx "C(?:$)^D" against "ABC DEF"... Did not find anchored substr "CD"... Match rejected by optimizer Freeing REx: `"C(?:$)^D"'

      I understand that it doesn't. Making the newline explicit as in tye's example does

      $ perl -Mre=debug -wle'"ABC\nDEF" =~ m/C(?:$)$\?^D/m' Freeing REx: `","' Omitting $` $& $' support. EXECUTING... Compiling REx `C(?:$) ?^D' size 11 Got 92 bytes for offset annotations. first at 1 rarest char D at 0 rarest char C at 0 1: EXACT <C>(3) 3: MEOL(4) 4: CURLY {0,1}(8) 6: EXACT <\n>(0) 8: MBOL(9) 9: EXACT <D>(11) 11: END(0) anchored "C"$ at 0 floating "D" at 1..2 (checking floating) minlen 2 Offsets: [11] 1[1] 0[0] 5[1] 8[1] 0[0] 7[1] 0[0] 9[1] 10[1] 0[0] 11[0] Guessing start of match, REx "C(?:$) ?^D" against "ABC DEF"... Found floating substr "D" at offset 4... Found anchored substr "C"$ at offset 2... Starting position does not contradict /^/m... Guessed: match at offset 2 Matching REx "C(?:$) ?^D" against "C DEF" Setting an EVAL scope, savestack=6 2 <AB> <C DEF> | 1: EXACT <C> 3 <ABC> < DEF> | 3: MEOL 3 <ABC> < DEF> | 4: CURLY {0,1} EXACT <\n> can match 1 times out of 1... Setting an EVAL scope, savestack=6 4 <ABC > <DEF> | 8: MBOL 4 <ABC > <DEF> | 9: EXACT <D> 5 <ABC D> <EF> | 11: END Match successful! Freeing REx: `"C(?:$)\n?^D"'