tandx has asked for the wisdom of the Perl Monks concerning the following question:

I just could not figure out how /m works with ^ in the regular expression matching. I tested the following codes:
"AAC\nGTT"=~/^.*$/m;
print $&;
which gives AAC.
However since /m makes ^ and the $ match around embeded newlines, for me the last pattern match should be GTT.
The modified code give the same result:
"AAC\nGTT\n"=~/^.*$/m;
print $&;
So, it looks to me the /m did not modify the ^ default behavior. Please help me with this entry-level question. Thanks.

Replies are listed 'Best First'.
Re: /m pattern matching modifier
by toolic (Bishop) on Oct 21, 2011 at 13:20 UTC
    YAPE::Regex::Explain might help you here:
    (?m-isx:^.*$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?m-isx: group, but do not capture (with ^ and $ matching start and end of line) (case- sensitive) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- ^ the beginning of a "line" ---------------------------------------------------------------------- .* any character except \n (0 or more times (matching the most amount possible)) ---------------------------------------------------------------------- $ before an optional \n, and the end of a "line" ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
    The .* does not match a newline character, and $ is optional. Perhaps you also want the /s modifier (perlre):
    use warnings; use strict; "AAC\nGTT\n" =~ /^.*$/ms; print $&; __END__ AAC GTT
      Thanks for the quick response.
      I can understand how /s modifier changes other metacharacter's behavior, but still have difficulty to under the logic of /m.
      My initial testing code: "AAC\nGTT"=~/^.*$/m only find the first match (thanks to the second perlmonk who responsed to my initial question). Then I tested "AAC\nGTT"=~/^.*$/mg, which still gave AAC. Since .* with /m won't match \n sign, the second part of the string GTT should fit to the regular expression ^.*$ according to /m definition.
      Interestingly, "\nGTT\n"=~/^.*$/m gave nothing,
      while "GTT\n"=~/^.*$/mg shows GTT.
      Thanks
Re: /m pattern matching modifier
by moritz (Cardinal) on Oct 21, 2011 at 13:40 UTC

    I think you understand ^ and $ just fine; if /m is in effect, there is where they can match:

    "AAC\nGTT\n" ^ $$^ $$

    But the regex only searches for the first match, and because the dot doesn't match the \n (it would only do that with /s), it goes from A to C. If you ask perl to do a second match, it will find GTT:

    $ perl -wle 'print $& while "AAC\nGTT"=~/^.*$/mg;' AAC GTT

    If you want to match the second line straight away, you can do something like this:

    $ perl -wle '"AAC\nGTT"=~/.*^(.*)$/ms; print $1' GTT
Re: /m pattern matching modifier
by ikegami (Patriarch) on Oct 21, 2011 at 13:50 UTC

    for me the last pattern match should be GTT.

    You only match once. How does "last" fit in? Perhaps you mean you expect the match operator to match as late as possible? If so, that's wrong; it matches as early as possible.

    Add a leading (?s:.*) to make it match as late as possible.

    "AAC\nGTT" =~ /(?s:.*)^.*$/m;
Re: /m pattern matching modifier
by jethro (Monsignor) on Oct 21, 2011 at 13:25 UTC
    Try "AAC\nGTT"=~/^.*$/mg;. The way you have it now it will only look for the first match.

      The OP is asking why the first match is returned by $&. Even with /g the first match is returned because it is not in list context.

      #!/usr/bin/perl use warnings; use strict; "AAC\nGTT"=~/^.*$/mg; print "\$& scalar context => $&\n"; my @matches = "AAC\nGTT"=~/^.*$/mg; print "\$& list context => $&\n"; print "\@matches => @matches\n"; __END__ output: $& scalar context => AAC $& list context => GTT @matches => AAC GTT

        'g' is not restricted to list context. If you use a regex on a specific variable multiple times with the 'g' modifier you will get all the matches one by one.

      Thanks. I tried both:
      "AAC\nGTT"=~/^.*$/mg
      and
      "AAC\nGTT\n"=~/^.*$/mg.
      However, the results stay the same: AAC.
      How to explain this??

        For 'g' to work in scalar context you need to search multiple times. AND you have to search the same variable, because the variable remembers the location up to which it searched last. Or use an array and list context:

        Scalar usage:

        #scalar usage: my $x="AAC\nGTT"; my $i=0; while ($x=~/^.*$/mg) { print $&; } #or do the loop by foot if you know how many times it will match: my $x="AAC\nGTT"; $x=~/^.*$/mg; print $&; $x=~/^.*$/mg; print $&;

        or use the regex in list context:

        my $x="AAC\nGTT"; my @allhits= $x=~/^.*$/mg; print join(" - ", @allhits),"\n";

        Note: This would have served better as a reply to Re^2: /m pattern matching modifier, which first mentions the  "\nGTT\n" string.

        The regex  /^.*$/mg matches the empty string (not 'nothing', i.e., no match) in the string  "\nGTT\n" because the /m regex modifier causes ^ to match at the start of a string (the default) and also immediately after an embedded newline, and causes $ to match its default and also just before an embedded newline.

        A regex looks for the leftmost match. The leftmost position in the string above that matches the regex above is ^ (the absolute start of the string), .* (zero of any character except a newline), and $ (just before the first newline), and the string that exists at this position is the empty string.

        Regexes are often counter-intuitive!

        Updates:

        1. And, as jethro said, even with the /g modifier, the regex matching in void or scalar context will still only return the leftmost of all possible matches on the first match attempt.
        2. s/place/position/g, s/just before a newline/just before the first newline/ in the foregoing text.

Re: /m pattern matching modifier
by tandx (Novice) on Oct 21, 2011 at 14:15 UTC
    Thanks to all of you, now I understand what happened on the code. But I have to admit that pattern searching on scalar with /g is a bit "out of" my thinking logic :-).

    Once again, thank you all for your help. It is a productive discussion for me.