Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

You won't believe what this regular expression does!

by salva (Canon)
on Feb 25, 2021 at 10:44 UTC ( #11128774=perlquestion: print w/replies, xml ) Need Help??

salva has asked for the wisdom of the Perl Monks concerning the following question:

Well, at least, it has surprised me:
my $out = "hello\nfoo" =~ s/o*$/O/gmr; say $out
hellOO fOO
What is happening is that the regular expression matches the o at the end of the line, replaces it and then it matches the empty space at the end of the line and replaces it again.

But that's not what I would expect. I think that once $ is matched, Perl should continue matching in the next line.

What's your opinion?

Replies are listed 'Best First'.
Re: You won't believe what this regular expression does!
by haukex (Bishop) on Feb 25, 2021 at 11:32 UTC

    Surprising, yes, but I guess I'd call it a pitfall of both * and a zero-width assertion. s/o*$/O/gm first matches "hello\n" and the $ assertion is satisfied, and since it's zero-width the matching position is left just before the \n. Then it matches "hello \n" because o* and $ are satisfied, and then the regex engine matches the same thing again but the rules in Repeated Patterns Matching a Zero length Substring kick in, advancing the position past the \n instead. Update: You can see this in action with: perl -MRegexp::Debugger -e '"hello\nfoo"=~s/o*$/O/gmr' Update 2: Note the same thing happens again with foo - it first replaces oo with O, and then adds the second O because it matched the $ again. You can see this with "o\no".

Re: You won't believe what this regular expression does!
by LanX (Sage) on Feb 25, 2021 at 13:01 UTC
    Lets dissect this into smaller problems.

    Simplification

    I tried to simplify the case to avoid misunderstandings

    DB<32> p "hello" =~ s/o*$/O/gr; hellOO DB<33> $_="hello"; s/o*$/O/g; print # for older Perls hellOO DB<34>

    Surprise: the o is replaced twice.

    Explanation so far

    You and Hauke already explained that

    • pos isn't changing after the first match b/c of the zero-width of $
    • the empty o* is matching again

    (And I agree that the referenced perlre#Repeated-Patterns-Matching-a-Zero-length-Substring needs a rewrite)

    DB<41> $_="hello"; say pos,"($1)" while m/(o*$)/g; # pos doesn't c +hange 5(o) 5() DB<42> p "hello" =~ s/x*$/O/gr; # empty match ( +no x) helloO

    Disappointments

    Now, why is it surprising?

    I think your case is that $ in combination with the /m modifier should act differently. Correct?

    • Would this be consistent?
    • Are there already examples of zero-width assertions who does it that way?
    • Are there work-arounds to achieve what you want? (i.e. skipping zero-length matches)
    Workarounds

    Here a guess for the last question

    DB<44> p "hello\nfoo" =~ s/o*\n/O/gmr; hellOfoo DB<45> p "hello\nfoo\n" =~ s/o*\n/O/gmr; # added \n at the end of + input hellOfO DB<46>

    Meta

    Question @all: Is the problem better understood now? :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    edit

    added more code

    update

    added headlines for structuring

    ) because empty patterns are always matching

    compare

    DB<59> p "12345" =~ s/x*/ /gmr; 1 2 3 4 5 DB<60>
      Are there work-arounds to achieve what you want?

      I sometimes use (?:\n|\z) to be explicit that I want the line endings to be consumed by the engine.

        Thanks! :)

        But please note the second fOO

        DB<55> p "hello\nfoo" =~ s/o*(?:\n|\z)/O/gmr; hellOfOO DB<56>

        I'm busy right now, but I seem to remember that one could use features for atomic matches...

        I'll try later...

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        update

        ) nah doesn't help, since it's not a backtracking problem.

Re: You won't believe what this regular expression does!
by LanX (Sage) on Feb 25, 2021 at 15:09 UTC
    Didn't you rather want to use + instead of * ?
    DB<48> p "hello\nfoo" =~ s/o*$/O/gmr; hellOO fOO DB<49> p "hello\nfoo" =~ s/o+$/O/gmr; hellO fO DB<50>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: You won't believe what this regular expression does!
by Anonymous Monk on Feb 25, 2021 at 11:32 UTC
    What perl exactly?
      This is perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-li +nux-gnu-thread-multi

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11128774]
Approved by 1nickt
Front-paged by 1nickt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others meditating upon the Monastery: (7)
As of 2022-05-26 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    Do you prefer to work remotely?



    Results (93 votes). Check out past polls.

    Notices?