comment on

my $d = $s =~ s/[\s\S]*^$m *: (.*)$(?:[\s\S]*)/$1/rm;

The expression [\s\S] to express "match any character" cries out for comment. I assume it is used to avoid the . (dot) metacharacter when promoted by //s to "dot matches all" status.

This rubs me the wrong way. If dot (with //s) matches all, why not just use it that way? (All code examples that follow enable warnings and strictures. Also note that the //r substitution modifier is only avaliable with Perl versions 5.14+.)

c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq{ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = 'bbb';
 ;;
 my $t = $s =~ s/.*^$m *: (.*)$(?:.*)/$1/rsm;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[BBB
ccc       : CCC
]]
[download]

This is arguably clearer, with only the tiny problem that it doesn't work! Why not?

Consider the (.*) capture group. With dot matching anything, it greedily grabs everything to the end of the string. To achieve an overall match, the regex still has to match $ at the end of the string, which is easy, and (?:.*) "zero or more of anything" after the end of the string, also easy. So capture group 1 and $1 now contain everything to the end of the string, which is substituted back into the string.

But the intent of (.*) was only to capture everything up to the $ anchor before the first embedded newline (due to //m). How to restrain dot?

One way would be to use a *? "lazy" modifier for the normally greedy * match quantifier: dot will then match as little as necessary to get to the first $ anchor.

c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq{ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = 'bbb';
 ;;
 my $t = $s =~ s/.*^$m *: (.*?)$(?:.*)/$1/rsm;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[BBB]]
[download]

Now we're getting somewhere!

But one could argue that the intent of "anything except a newline" is more clearly expressed by [^\n] and "capture as much as possible to the first newline" is better as ([^\n]*) (remember that the code must be maintained, one must assume forever, so clear intent is important).

c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq{ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = 'bbb';
 ;;
 my $t = $s =~ s/.*^$m *: ([^\n]*)$(?:.*)/$1/rsm;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[BBB]]
[download]

(In this version, the $ anchor is redundant, but does no harm and arguably serves to further clarify intent.)

Lastly, an example in my own preferred style, taken from TheDamian's PBP:

c:\@Work\Perl\monks>perl -wMstrict -le
"my $s = qq{aaa       : AAA\n}
       . qq{bbb       : BBB\n}
       . qq{ccc       : CCC\n}
       ;
 print qq{[[$s]]};
 ;;
 my $m = qr{ bbb }xms;
 ;;
 my $t = $s =~ s{ .* ^ $m [ ]* : [ ] ([^\n]*) $ .* }{$1}xmsr;
 ;;
 print qq{[[$t]]};
"
[[aaa       : AAA
bbb       : BBB
ccc       : CCC
]]
[[BBB]]
[download]

The $m is no longer defined as a raw string, but with qr// as a regex in its own right. This allows it to be used "atomically" within another regex, as it is in the substitution: expressions like $m+ or $m{4} work as expected. The $ is still redundant, but still arguably clarifies intent. The same could be said about the preceding ^ in the regex, but I would argue that anchoring the $m atom in some way is potentially important, so just leave it be.

And that's the first several inches of the whole nine regex yards. HTH

Give a man a fish: <%-(-(-(-<

In reply to Re^3: Why multiline regex doesn't work? by AnomalousMonk
in thread Why multiline regex doesn't work? by nbd

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.