saintmike has asked for the wisdom of the Perl Monks concerning the following question:

What's the result of the following code snippet?
my $string = "12\n34"; $string =~ s/.*/go/gs;
Pick one:

Replies are listed 'Best First'.
Re: Regex Pop Quiz with .*, /g, and /s (bug)
by tye (Sage) on Oct 02, 2007 at 18:05 UTC

    Just a couple of days ago Larry himself admitted (in the CB) that this is a bug in Perl.

    - tye        

Re: Regex Pop Quiz with .*, /g, and /s
by Thelonius (Priest) on Oct 02, 2007 at 18:13 UTC
    I'd have to say that it was not what I expected. I'm also surprised that this hasn't bitten me before. While I know theoretically that * can match empty strings, perl has always seemed to me to do the intuitive thing.

    I guess I generally don't replace nothing with something. Usually when I use * I'm either just skipping over white space (or the moral equivalent), or I replace any pattern than has * in it with itself. That is, something like  s/A(.*)B/C${1}D/g;

    Generally, I also try to constrain my patterns more, so I usually avoid constructs like ".*" or "*?", and would write something like s/A([^B]*)B/C${1}D/g;

    The /s doesn't seem to have anything to do with it, except that, of course, you included a \n in your string. For example,

    my $string = "aaab"; $string =~ s/a*/go/g;
    Now string is "gogobgo".
Re: Regex Pop Quiz with .*, /g, and /s
by kyle (Abbot) on Oct 02, 2007 at 17:54 UTC

    I'm not really into the quiz thing (except maybe in the polling section), so I just ran it.

    The answer is "none of the above". Since you haven't used the /m modifier, Perl won't treat it as a "single line". The replacement first takes off the first three characters (which includes the newline), and then it goes for another pass to get the last two characters.

    I'm not really all that up on how /m does what it does, but the $* perlvar entry talks about optimization. I take this to mean that it tells Perl when to get sloppy in the name of speed.

    What I find most interesting, however is that the /g seems to have something to do with it too. That is /ms does one replacement, but /gms does two replacements.

    It makes my brain hurt. Basically, when you replace something that can have zero width, you're headed for a land of confusion (stop by and say hi; I hang out there a lot).

      Since you haven't used the /m modifier, Perl won't treat it as a "single line".

      No. That's backwards. Single-line is the default. m switches to (m)ultiline mode.

      Furthermore, m only affects the defintion of ^ and $. Since neither are used here, whether m is present or absent is irrelevant.

      The replacement first takes off the first three characters (which includes the newline), and then it goes for another pass to get the last two characters.

      No. In fact, the \n is a red herring. The same problem occurs with my $string = "1234";.

      The first pass sees the characters at pos 0 to 3 replaced with "go", setting pos = 4.
      The second pass sees the characters at pos 4 to 4 replaced with "go", setting pos = 4.
      The third pass ends the g loop since the only possible match would start and end at the same positions as the second pass.

      my $string = "1234"; $string =~ s/.*/print("($&)");'go'/egs; # (1234)()

      What I find most interesting, however is that the /g seems to have something to do with it too. That is /ms does one replacement

      With and without g, both regex do the same first substitution. With g, it proceeds to do other possible substitutions. That's the very definition of g.