Even after multiple attempts, I am at a total loss of how the "m" and "s" works for regex.
- /m changes the meaning of ^ and $:
- Without /m,
- ^ matches only at the very beginning of the string. (This is the same as \A, except that \A is not affected by /m.)
- $ matches at the very end of the string, but if the string ends with \n, it will match just before and just after this \n. (This is the same as \Z, except that \Z is not affected by /m.)
- With /m,
- ^ matches at the very beginning of the string, and just after any \n, except if the \n is the last character in the string. In other words, it matches at the beginning of each line within the string.
- $ matches just before each \n, in other words before the end of every line within the string, and at the very end of the string.
- /s changes the meaning of .:
- Without /s, . matches anything except the newline, i.e. [^\n]. In other words, a regex of /.+/g is limited to matching one line within the string at a time.
- With /s, . matches absolutely any character, including \n.
Note that /m and /s are completely independent of one another. Keep in mind that ^ and $ are zero-width matches - for example, this means that with $_ = "a\nb", a regex of /$/gm will match and leave the regex engine's position at before the \n*, and a following regex of /./gs would then match that \n. Here is some code to play around with (try changing the lists of $strings and $regexes). As you can see, /m really only becomes important if there are multiple \n's in the string. And of course there's the WebPerl Regex Tester that visualizes this as well (modern browser required).
use warnings;
use strict;
use open qw/:std :utf8/;
use Term::ANSIColor qw/colored/;
for my $str ( "a","a\n","a\nb","a\n\nb","a\nb\nc\n","a\nb\nc\nd") {
for my $regex ( '/^/g','/^/gm','/$/g','/$/gm','/./g','/./gs' ) {
my $o = join( '', map { sprintf "%2s",
chr( $_<0x21 ? 0x2400+$_ : $_==0x7F ? 0x2421 : $_ ) }
map ord, split //, $str )." ";
my @matches;
eval qq{ push \@matches, [[\@-],[\@+]] while \$str=~$regex ;1}
or die $@;
my ($matchcnt,%matches) = (1);
for my $match (@matches) {
my @pos = $match->[0][0]==$match->[1][0]
? ( $match->[0][0] * 2 )
: map { $_*2+1 } $match->[0][0]..$match->[1][0]-1;
for my $p (@pos) {
die "overlapping matches not supported"
if exists $matches{$p};
$matches{$p} = $matchcnt;
}
} continue { $matchcnt++ }
substr($o, $_, 1) = colored(['underline'], substr($o, $_, 1))
#"<u>".substr($o, $_, 1)."</u>" # alternative for HTML
for sort { $b<=>$a } keys %matches;
printf "%6s: %s\n", $regex, $o;
}
}
Output:
/^/g: a
/^/gm: a
/$/g: a
/$/gm: a
/./g: a
/./gs: a
/^/g: a ␊
/^/gm: a ␊
/$/g: a ␊
/$/gm: a ␊
/./g: a ␊
/./gs: a ␊
/^/g: a ␊ b
/^/gm: a ␊ b
/$/g: a ␊ b
/$/gm: a ␊ b
/./g: a ␊ b
/./gs: a ␊ b
/^/g: a ␊ ␊ b
/^/gm: a ␊ ␊ b
/$/g: a ␊ ␊ b
/$/gm: a ␊ ␊ b
/./g: a ␊ ␊ b
/./gs: a ␊ ␊ b
/^/g: a ␊ b ␊ c ␊
/^/gm: a ␊ b ␊ c ␊
/$/g: a ␊ b ␊ c ␊
/$/gm: a ␊ b ␊ c ␊
/./g: a ␊ b ␊ c ␊
/./gs: a ␊ b ␊ c ␊
/^/g: a ␊ b ␊ c ␊ d
/^/gm: a ␊ b ␊ c ␊ d
/$/g: a ␊ b ␊ c ␊ d
/$/gm: a ␊ b ␊ c ␊ d
/./g: a ␊ b ␊ c ␊ d
/./gs: a ␊ b ␊ c ␊ d
* Update: Note that Repeated Patterns Matching a Zero length Substring is relevant here (example).
-
Are you posting in the right place? Check out Where do I post X? to know for sure.
-
Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
<code> <a> <b> <big>
<blockquote> <br /> <dd>
<dl> <dt> <em> <font>
<h1> <h2> <h3> <h4>
<h5> <h6> <hr /> <i>
<li> <nbsp> <ol> <p>
<small> <strike> <strong>
<sub> <sup> <table>
<td> <th> <tr> <tt>
<u> <ul>
-
Snippets of code should be wrapped in
<code> tags not
<pre> tags. In fact, <pre>
tags should generally be avoided. If they must
be used, extreme care should be
taken to ensure that their contents do not
have long lines (<70 chars), in order to prevent
horizontal scrolling (and possible janitor
intervention).
-
Want more info? How to link
or How to display code and escape characters
are good places to start.