Re: Why multiline regex doesn't work?

In your second regex, you achieve no match because the regex expression [.\n] does not mean what (I think) you think it means. There is also another problem with a predefined special variable $[ that is being interpolated instead of the first part of the $[.\n] regex expression you intended.

c:\@Work\Perl\monks>perl -le
"use warnings;
 use strict;
 ;;
 my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq}ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = 'bbb';
 ;;
 my $t = $s =~
   s/[.\n]*?^$m *: (.*)$[.\n]*/$1/rm
   ;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[download]

The '.' (period) character is not special, i.e., not a metacharacter, in a [] regex character class; it just matches a literal period, and there are no such characters in your $s test string.

I'm not sure what the [.\n] expression was intended to represent (maybe [^\n] "anything but a newline"?), so I can't comment further until you can provide greater clarity. Note, however, that disambiguating the $ metacharacter at least produces a different output, i.e., a match and substitution, even though the output is still not what you expect:

c:\@Work\Perl\monks>perl -le
"use warnings;
 use strict;
 ;;
 my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq}ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = 'bbb';
 ;;
 my $t = $s =~
   s/[.\n]*?^$m *: (.*)$(?:[.\n]*)/$1/rm
   ;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[aaa       : AAABBBccc       : CCC
]]
[download]

(There is no warning because $[ has a default initialized value.)

Update: Note that the ambiguity of $[.\n] (regex) and the $[ predefined special variable (see perlvar) is yet another argument in favor of the /x embedded whitespace regex modifier (other than simply being able to see the darn regex). Consider:

c:\@Work\Perl\monks>perl -le
"use warnings;
 use strict;
 ;;
 my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq{ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = 'bbb';
 ;;
 my $t = $s =~
   s/ [.\n]*? ^ $m [ ]* : [ ] (.*) $ [.\n]* /$1/xrm
   ;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[aaa       : AAABBBccc       : CCC
]]
[download]

Still not what you expected, but one less pitfall to negotiate. (The [ ] expression is what I like to use to represent a space, where \s represents any whitespace character, a larger set.)

Further Update: The interpolation of $[ can be clearly seen here:

c:\@Work\Perl\monks>perl -wMstrict -e "my $rx = qr{$[.\n]*}m;  print $
+rx;"
(?^m:0.\n]*)
[download]

The default value of $[ is 0;

Give a man a fish: <%-(-(-(-<

Comment on Re: Why multiline regex doesn't work? Select or Download Code

Replies are listed 'Best First'.
Re^2: Why multiline regex doesn't work? by nbd (Novice) on Jun 09, 2015 at 04:29 UTC
Thanks for the detailed explanation. That was exactly what I was asking about: exact parts of both regexes which work incorrectly. .\n was intended to match all characters, including newline character ( since with //m modifier '.' doesn't match newline ). But I see, that within square brackets the dot must be escaped. So, if all characters are expressed as \s\S, the regex now works: `my $d = $s =~ s/[\s\S]^$m : (.)$(?:[\s\S])/$1/rm;` [download] Thanks!	[reply] [d/l]
Re^3: Why multiline regex doesn't work? by AnomalousMonk (Archbishop) on Jun 09, 2015 at 15:00 UTC
... with //m modifier '.' doesn't match newline ... Just to be clear: With or without the `//m` regex modifier, the default behavior of the `.` (dot) metacharacter is to match everything except a newline. It is only the `//s` "dot matches all" modifier that causes dot to match absolutely everything. Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re^3: Why multiline regex doesn't work? by AnomalousMonk (Archbishop) on Jun 09, 2015 at 17:14 UTC
`my $d = $s =~ s/[\s\S]^$m : (.)$(?:[\s\S])/$1/rm;` The expression `[\s\S]` to express "match any character" cries out for comment. I assume it is used to avoid the `.` (dot) metacharacter when promoted by `//s` to "dot matches all" status. This rubs me the wrong way. If dot (with `//s`) matches all, why not just use it that way? (All code examples that follow enable warnings and strictures. Also note that the `//r` substitution modifier is only avaliable with Perl versions 5.14+.) `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aaa : AAA\n} . qq{bbb : BBB\n} . qq{ccc : CCC\n} ; print qq{[[$s]]}; ;; my $m = 'bbb'; ;; my $t = $s =~ s/.^$m : (.)$(?:.)/$1/rsm; ;; print qq{[[$t]]}; " [[aaa : AAA bbb : BBB ccc : CCC ]] [[BBB ccc : CCC ]]` [download] This is arguably clearer, with only the tiny problem that it doesn't work! Why not? Consider the `(.)` capture group. With dot matching anything, it greedily grabs everything to the end of the string. To achieve an overall match, the regex still has to match `$` at the end of the string, which is easy, and `(?:.)` "zero or more of anything" after the end of the string, also easy. So capture group 1 and `$1` now contain everything to the end of the string, which is substituted back into the string. But the intent of `(.)` was only to capture everything up to the `$` anchor before the first embedded* newline (due to `//m`). How to restrain dot? One way would be to use a `?` "lazy" modifier for the normally greedy `` match quantifier: dot will then match as little as necessary to get to the first `$` anchor. `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aaa : AAA\n} . qq{bbb : BBB\n} . qq{ccc : CCC\n} ; print qq{[[$s]]}; ;; my $m = 'bbb'; ;; my $t = $s =~ s/.^$m : (.?)$(?:.)/$1/rsm; ;; print qq{[[$t]]}; " [[aaa : AAA bbb : BBB ccc : CCC ]] [[BBB]]` [download] Now we're getting somewhere! But one could argue that the intent of "anything except a newline" is more clearly expressed by `[^\n]` and "capture as much as possible to the first newline" is better as `([^\n])` (remember that the code must be maintained, one must assume forever, so clear intent is important). `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aaa : AAA\n} . qq{bbb : BBB\n} . qq{ccc : CCC\n} ; print qq{[[$s]]}; ;; my $m = 'bbb'; ;; my $t = $s =~ s/.^$m : ([^\n])$(?:.)/$1/rsm; ;; print qq{[[$t]]}; " [[aaa : AAA bbb : BBB ccc : CCC ]] [[BBB]]` [download] (In this version, the `$` anchor is redundant, but does no harm and arguably serves to further clarify intent.) Lastly, an example in my own preferred style, taken from TheDamian's PBP: `c:\@Work\Perl\monks>perl -wMstrict -le "my $s = qq{aaa : AAA\n} . qq{bbb : BBB\n} . qq{ccc : CCC\n} ; print qq{[[$s]]}; ;; my $m = qr{ bbb }xms; ;; my $t = $s =~ s{ . ^ $m [ ]* : [ ] ([^\n]) $ . }{$1}xmsr; ;; print qq{[[$t]]}; " [[aaa : AAA bbb : BBB ccc : CCC ]] [[BBB]]` [download] The `$m` is no longer defined as a raw string, but with `qr//` as a regex in its own right. This allows it to be used "atomically" within another regex, as it is in the substitution: expressions like `$m+` or `$m{4}` work as expected. The `$` is still redundant, but still arguably clarifies intent. The same could be said about the preceding `^` in the regex, but I would argue that anchoring the `$m` atom in some way is potentially important, so just leave it be. And that's the first several inches of the whole nine regex yards. HTH Give a man a fish: `<%-(-(-(-<`	[reply] [d/l] [select]
Re^3: Why multiline regex doesn't work? by Anonymous Monk on Jun 09, 2015 at 07:46 UTC
But I see, that within square brackets the dot must be escaped. Yes, you've got the idea its just the lingo you need help with now :) escaping means prefixing it with a backslash -- ie turn "." into "\." but that isn't required, as inside a character class `[.]` the dot is not a metacharacter, it is a literal character (?s:.) means any character (including \n) alias [\w\W] alias [\s\S] alias [\d\D] alias \p{All}	[reply] [d/l]