Beefy Boxes and Bandwidth Generously Provided by pair Networks
Pathologically Eclectic Rubbish Lister
 
PerlMonks  

You won't believe what this regular expression does!

by salva (Canon)
on Feb 25, 2021 at 10:44 UTC ( [id://11128774]=perlquestion: print w/replies, xml ) Need Help??

salva has asked for the wisdom of the Perl Monks concerning the following question:

Well, at least, it has surprised me:
my $out = "hello\nfoo" =~ s/o*$/O/gmr; say $out
hellOO fOO
What is happening is that the regular expression matches the o at the end of the line, replaces it and then it matches the empty space at the end of the line and replaces it again.

But that's not what I would expect. I think that once $ is matched, Perl should continue matching in the next line.

What's your opinion?

Replies are listed 'Best First'.
Re: You won't believe what this regular expression does!
by haukex (Archbishop) on Feb 25, 2021 at 11:32 UTC

    Surprising, yes, but I guess I'd call it a pitfall of both * and a zero-width assertion. s/o*$/O/gm first matches "hello\n" and the $ assertion is satisfied, and since it's zero-width the matching position is left just before the \n. Then it matches "hello \n" because o* and $ are satisfied, and then the regex engine matches the same thing again but the rules in Repeated Patterns Matching a Zero length Substring kick in, advancing the position past the \n instead. Update: You can see this in action with: perl -MRegexp::Debugger -e '"hello\nfoo"=~s/o*$/O/gmr' Update 2: Note the same thing happens again with foo - it first replaces oo with O, and then adds the second O because it matched the $ again. You can see this with "o\no".

Re: You won't believe what this regular expression does!
by LanX (Saint) on Feb 25, 2021 at 13:01 UTC
    Lets dissect this into smaller problems.

    Simplification

    I tried to simplify the case to avoid misunderstandings

    DB<32> p "hello" =~ s/o*$/O/gr; hellOO DB<33> $_="hello"; s/o*$/O/g; print # for older Perls hellOO DB<34>

    Surprise: the o is replaced twice.

    Explanation so far

    You and Hauke already explained that

    • pos isn't changing after the first match b/c of the zero-width of $
    • the empty o* is matching again °

    (And I agree that the referenced perlre#Repeated-Patterns-Matching-a-Zero-length-Substring needs a rewrite)

    DB<41> $_="hello"; say pos,"($1)" while m/(o*$)/g; # pos doesn't c +hange 5(o) 5() DB<42> p "hello" =~ s/x*$/O/gr; # empty match ( +no x) helloO

    Disappointments

    Now, why is it surprising?

    I think your case is that $ in combination with the /m modifier should act differently. Correct?

    • Would this be consistent?
    • Are there already examples of zero-width assertions who does it that way?
    • Are there work-arounds to achieve what you want? (i.e. skipping zero-length matches)
    Workarounds

    Here a guess for the last question

    DB<44> p "hello\nfoo" =~ s/o*\n/O/gmr; hellOfoo DB<45> p "hello\nfoo\n" =~ s/o*\n/O/gmr; # added \n at the end of + input hellOfO DB<46>

    Meta

    Question @all: Is the problem better understood now? :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    edit

    added more code

    update

    added headlines for structuring

    °) because empty patterns are always matching

    compare

    DB<59> p "12345" =~ s/x*/ /gmr; 1 2 3 4 5 DB<60>
      Are there work-arounds to achieve what you want?

      I sometimes use (?:\n|\z) to be explicit that I want the line endings to be consumed by the engine.

        Thanks! :)

        But please note the second fOO

        DB<55> p "hello\nfoo" =~ s/o*(?:\n|\z)/O/gmr; hellOfOO DB<56>

        I'm busy right now, but I seem to remember that one could use features for atomic matches...

        I'll try later...°

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery

        update

        °) nah doesn't help, since it's not a backtracking problem.

Re: You won't believe what this regular expression does!
by LanX (Saint) on Feb 25, 2021 at 15:09 UTC
    Didn't you rather want to use + instead of * ?
    DB<48> p "hello\nfoo" =~ s/o*$/O/gmr; hellOO fOO DB<49> p "hello\nfoo" =~ s/o+$/O/gmr; hellO fO DB<50>

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

Re: You won't believe what this regular expression does!
by Anonymous Monk on Feb 25, 2021 at 11:32 UTC
    What perl exactly?
      This is perl 5, version 30, subversion 3 (v5.30.3) built for x86_64-li +nux-gnu-thread-multi

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11128774]
Approved by 1nickt
Front-paged by 1nickt
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-03-29 06:56 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found