Re: /m pattern matching modifier
by toolic (Bishop) on Oct 21, 2011 at 13:20 UTC
|
(?m-isx:^.*$)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?m-isx: group, but do not capture (with ^ and $
matching start and end of line) (case-
sensitive) (with . not matching \n)
(matching whitespace and # normally):
----------------------------------------------------------------------
^ the beginning of a "line"
----------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
----------------------------------------------------------------------
$ before an optional \n, and the end of a
"line"
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
The .* does not match a newline character, and $ is optional. Perhaps you also want the /s modifier (perlre):
use warnings;
use strict;
"AAC\nGTT\n" =~ /^.*$/ms;
print $&;
__END__
AAC
GTT
| [reply] [d/l] [select] |
|
|
Thanks for the quick response.
I can understand how /s modifier changes other metacharacter's behavior, but still have difficulty to under the logic of /m.
My initial testing code: "AAC\nGTT"=~/^.*$/m only find the first match (thanks to the second perlmonk who responsed to my initial question). Then I tested "AAC\nGTT"=~/^.*$/mg, which still gave AAC. Since .* with /m won't match \n sign, the second part of the string GTT should fit to the regular expression ^.*$ according to /m definition.
Interestingly, "\nGTT\n"=~/^.*$/m gave nothing,
while "GTT\n"=~/^.*$/mg shows GTT.
Thanks
| [reply] |
Re: /m pattern matching modifier
by moritz (Cardinal) on Oct 21, 2011 at 13:40 UTC
|
I think you understand ^ and $ just fine; if /m is in effect, there is where they can match:
"AAC\nGTT\n"
^ $$^ $$
But the regex only searches for the first match, and because the dot doesn't match the \n (it would only do that with /s), it goes from A to C. If you ask perl to do a second match, it will find GTT:
$ perl -wle 'print $& while "AAC\nGTT"=~/^.*$/mg;'
AAC
GTT
If you want to match the second line straight away, you can do something like this:
$ perl -wle '"AAC\nGTT"=~/.*^(.*)$/ms; print $1'
GTT
| [reply] [d/l] [select] |
Re: /m pattern matching modifier
by ikegami (Patriarch) on Oct 21, 2011 at 13:50 UTC
|
for me the last pattern match should be GTT.
You only match once. How does "last" fit in? Perhaps you mean you expect the match operator to match as late as possible? If so, that's wrong; it matches as early as possible.
Add a leading (?s:.*) to make it match as late as possible.
"AAC\nGTT" =~ /(?s:.*)^.*$/m;
| [reply] [d/l] [select] |
Re: /m pattern matching modifier
by jethro (Monsignor) on Oct 21, 2011 at 13:25 UTC
|
Try "AAC\nGTT"=~/^.*$/mg;. The way you have it now it will only look for the first match.
| [reply] [d/l] |
|
|
#!/usr/bin/perl
use warnings;
use strict;
"AAC\nGTT"=~/^.*$/mg;
print "\$& scalar context => $&\n";
my @matches = "AAC\nGTT"=~/^.*$/mg;
print "\$& list context => $&\n";
print "\@matches => @matches\n";
__END__
output:
$& scalar context => AAC
$& list context => GTT
@matches => AAC GTT
| [reply] [d/l] |
|
|
| [reply] |
|
|
Thanks. I tried both: "AAC\nGTT"=~/^.*$/mg and "AAC\nGTT\n"=~/^.*$/mg. However, the results stay the same: AAC.
How to explain this??
| [reply] |
|
|
#scalar usage:
my $x="AAC\nGTT";
my $i=0;
while ($x=~/^.*$/mg) {
print $&;
}
#or do the loop by foot if you know how many times it will match:
my $x="AAC\nGTT";
$x=~/^.*$/mg;
print $&;
$x=~/^.*$/mg;
print $&;
or use the regex in list context:
my $x="AAC\nGTT";
my @allhits= $x=~/^.*$/mg;
print join(" - ", @allhits),"\n";
| [reply] [d/l] [select] |
|
|
Note: This would have served better as a reply to Re^2: /m pattern matching modifier, which first mentions the "\nGTT\n" string.
The regex /^.*$/mg matches the empty string (not 'nothing', i.e., no match) in the string "\nGTT\n" because the /m regex modifier causes ^ to match at the start of a string (the default) and also immediately after an embedded newline, and causes $ to match its default and also just before an embedded newline.
A regex looks for the leftmost match. The leftmost position in the string above that matches the regex above is ^ (the absolute start of the string), .* (zero of any character except a newline), and $ (just before the first newline), and the string that exists at this position is the empty string.
Regexes are often counter-intuitive!
Updates:
-
And, as jethro said, even with the /g modifier, the regex matching in void or scalar context will still only return the leftmost of all possible matches on the first match attempt.
-
s/place/position/g, s/just before a newline/just before the first newline/ in the foregoing text.
| [reply] [d/l] [select] |
Re: /m pattern matching modifier
by tandx (Novice) on Oct 21, 2011 at 14:15 UTC
|
Thanks to all of you, now I understand what happened on the code. But I have to admit that pattern searching on scalar with /g is a bit "out of" my thinking logic :-).
Once again, thank you all for your help. It is a productive discussion for me. | [reply] |