Why \n matches but not $^?

tel2 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Why \n matches but not $^? (weight) by tye (Sage) on Oct 13, 2008 at 04:41 UTC
You've tripped over the most-voodoo of Perl's parsing. $\ is a variable and the regex parser needs to decide whether `$\n` meant "end-of-line then newline" or "the contents of $\ then the letter 'n'". Perl prefers the latter interpretation. Several ways to try to resolve this don't work: `perl -le'$_="ABC\nDEF"; s/C\$\n^D/-/m; print' perl -le'$_="ABC\nDEF"; s/C$[\n]^D/-/m; print'` [download] But there are several ways to successfully work around the problem: `perl -le'$_="ABC\nDEF"; s/C$ \n^D/-/mx; print' perl -le'$_="ABC\nDEF"; s/C(?:$)\n^D/-/m; print' perl -le'$_="ABC\nDEF"; s/C$(?:\n)^D/-/m; print'` [download] But they produce "AB-EF" not "ABC-DEF". And the following one will never match anything if it were parsed the way you expected, since there has to be a \n in the matched string between $ and ^: `perl -le'$_="ABC\nDEF"; s/C$^D/-/m; print'` [download] - tye	[reply] [d/l] [select]
Re^2: Why \n matches but not $^? (weight) by moritz (Cardinal) on Oct 13, 2008 at 09:06 UTC
If you suspect that a regex isn't parsed the way you wanted it, you can `use re 'debug';` to find out: $ perl -Mre=debug -wle 'm/C$\n^D/' Freeing REx: `","' Omitting $` $& $' support. EXECUTING... Compiling REx `C n^D' size 6 Got 52 bytes for offset annotations. first at 1 rarest char D at 3 1: EXACT <C\nn>(3) 3: BOL(4) 4: EXACT <D>(6) 6: END(0) anchored "C ... [download] You don't have to understand everything to notice that the `EXACT <C\nn>` isn't what you were after. `The literal <c>n` has to come from the `...\n` in the regex, so the thing before it (a newline) has to come from the previous token in the regex.	[reply] [d/l] [select]
Re^3: Why \n matches but not $^? (dump) by tye (Sage) on Oct 14, 2008 at 00:02 UTC
Thanks. That reminded of one thing I had been trying to remember to include: `$ perl -e'print qr/C$\n^D/,$/' (?-xism:Cn^D) $ perl -le'print qr/C$\n^D/' (?-xism:C n^D)` [download] which is a lower-tech way to notice that your regex wasn't parsed the way you expected. (It also demonstrates why I use `-l`: so I don't have a append newlines to each of my print statements.) - tye	[reply] [d/l]
Re^3: Why \n matches but not $^? (weight) by tel2 (Pilgrim) on Oct 13, 2008 at 22:53 UTC
Thanks moritz, That's very helpful. I didn't know about that re debug module. BTW: Why are you and tye using Perl's -l switch? Terry	[reply]
Re^4: Why \n matches but not $^? (weight) by moritz (Cardinal) on Oct 14, 2008 at 05:57 UTC
Re^2: Why \n matches but not $^? (weight) by tel2 (Pilgrim) on Oct 13, 2008 at 08:52 UTC
Thanks heaps, Tye! A prompt, accurate and exhaustive response. (Well, I'm almost exhausted from just reading it). Nice work! Yes - I meant "AB-EF" when I wrote "ABC-DEF". Sorry - typo. PS: I assume the -l switch is superfluous in your answers? PPS: Any ideas why those 1st 2 1-liners you gave don't work? They look OK from this angle. Thanks again.	[reply]
Re^3: Why \n matches but not $^? (weight) by ikegami (Patriarch) on Oct 13, 2008 at 13:56 UTC
I assume the -l switch is superfluous in your answers? You assume wrongly. perlrun	[reply]
Re^3: Why \n matches but not $^? (practice) by tye (Sage) on Oct 14, 2008 at 00:14 UTC
Any ideas why those 1st 2 1-liners you gave don't work? They look OK from this angle. Sure. Why don't you practice the methods moritz and I offered for helping to figure out how a regex was parsed. - tye	[reply]
Re^2: Why \n matches but not $^? (weight) by procura (Beadle) on Oct 13, 2008 at 21:32 UTC
If 'm' as flag causes ^ and $ to gain magic as in `m Treat string as multiple lines. That is, change "^" and "$" from matching the start or end of the string to matching the start or end of any line anywhere within the string.` [download] and `$` matches end-of-line (or before newline) as in `$ Match the end of the line (or before newline at the end)` It might be arguable that the newline would be matched by `$` in $ perl -Mre=debug -wle'"ABC\nDEF" =~ m/C(?:$)^D/m' Freeing REx: `","' Compiling REx `C(?:$)^D' size 7 Got 60 bytes for offset annotations. first at 1 rarest char D at 1 1: EXACT <C>(3) 3: MEOL(4) 4: MBOL(5) 5: EXACT <D>(7) 7: END(0) anchored "CD" at 0 (checking anchored) minlen 2 Offsets: [7] 1[1] 0[0] 5[1] 7[1] 8[1] 0[0] 9[0] Omitting $` $& $' support. EXECUTING... Guessing start of match, REx "C(?:$)^D" against "ABC DEF"... Did not find anchored substr "CD"... Match rejected by optimizer Freeing REx: `"C(?:$)^D"' [download] I understand that it doesn't. Making the newline explicit as in tye's example does $ perl -Mre=debug -wle'"ABC\nDEF" =~ m/C(?:$)$\?^D/m' Freeing REx: `","' Omitting $` $& $' support. EXECUTING... Compiling REx `C(?:$) ?^D' size 11 Got 92 bytes for offset annotations. first at 1 rarest char D at 0 rarest char C at 0 1: EXACT <C>(3) 3: MEOL(4) 4: CURLY {0,1}(8) 6: EXACT <\n>(0) 8: MBOL(9) 9: EXACT <D>(11) 11: END(0) anchored "C"$ at 0 floating "D" at 1..2 (checking floating) minlen 2 Offsets: [11] 1[1] 0[0] 5[1] 8[1] 0[0] 7[1] 0[0] 9[1] 10[1] 0[0] 11[0] Guessing start of match, REx "C(?:$) ?^D" against "ABC DEF"... Found floating substr "D" at offset 4... Found anchored substr "C"$ at offset 2... Starting position does not contradict /^/m... Guessed: match at offset 2 Matching REx "C(?:$) ?^D" against "C DEF" Setting an EVAL scope, savestack=6 2 <AB> <C DEF> \| 1: EXACT <C> 3 <ABC> < DEF> \| 3: MEOL 3 <ABC> < DEF> \| 4: CURLY {0,1} EXACT <\n> can match 1 times out of 1... Setting an EVAL scope, savestack=6 4 <ABC > <DEF> \| 8: MBOL 4 <ABC > <DEF> \| 9: EXACT <D> 5 <ABC D> <EF> \| 11: END Match successful! Freeing REx: `"C(?:$)\n?^D"' [download]	[reply] [d/l] [select]