http://qs1969.pair.com?node_id=11128780


in reply to You won't believe what this regular expression does!

Lets dissect this into smaller problems.

Simplification

I tried to simplify the case to avoid misunderstandings

DB<32> p "hello" =~ s/o*$/O/gr; hellOO DB<33> $_="hello"; s/o*$/O/g; print # for older Perls hellOO DB<34>

Surprise: the o is replaced twice.

Explanation so far

You and Hauke already explained that

(And I agree that the referenced perlre#Repeated-Patterns-Matching-a-Zero-length-Substring needs a rewrite)

DB<41> $_="hello"; say pos,"($1)" while m/(o*$)/g; # pos doesn't c +hange 5(o) 5() DB<42> p "hello" =~ s/x*$/O/gr; # empty match ( +no x) helloO

Disappointments

Now, why is it surprising?

I think your case is that $ in combination with the /m modifier should act differently. Correct?

Workarounds

Here a guess for the last question

DB<44> p "hello\nfoo" =~ s/o*\n/O/gmr; hellOfoo DB<45> p "hello\nfoo\n" =~ s/o*\n/O/gmr; # added \n at the end of + input hellOfO DB<46>

Meta

Question @all: Is the problem better understood now? :)

Cheers Rolf
(addicted to the Perl Programming Language :)
Wikisyntax for the Monastery

edit

added more code

update

added headlines for structuring

) because empty patterns are always matching

compare

DB<59> p "12345" =~ s/x*/ /gmr; 1 2 3 4 5 DB<60>

Replies are listed 'Best First'.
Re^2: You won't believe what this regular expression does!
by haukex (Bishop) on Feb 25, 2021 at 14:42 UTC
    Are there work-arounds to achieve what you want?

    I sometimes use (?:\n|\z) to be explicit that I want the line endings to be consumed by the engine.

      Thanks! :)

      But please note the second fOO

      DB<55> p "hello\nfoo" =~ s/o*(?:\n|\z)/O/gmr; hellOfOO DB<56>

      I'm busy right now, but I seem to remember that one could use features for atomic matches...

      I'll try later...

      Cheers Rolf
      (addicted to the Perl Programming Language :)
      Wikisyntax for the Monastery

      update

      ) nah doesn't help, since it's not a backtracking problem.

        But please note the second fOO

        Yes, good point! I think the main question is what the intent of the regex is. If it's "replace any o's at the end of each line", then the better solution is, as you said, /o+$/, and using o* is the "mistake".