Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl Monk, Perl Meditation
 
PerlMonks  

Re: Non-greedy match end of line bug?

by Fletch (Bishop)
on Oct 26, 2021 at 01:31 UTC ( [id://11138037]=note: print w/replies, xml ) Need Help??


in reply to Non-greedy match end of line bug?

So reading perlre, a $ matches:

  • at the end of the string (your first case)
  • or before newline at the end of the string (what happens in your second case, leaving the newline out of $1)
  • before any newline if you've used /m

If you use (say) Regexp::Debugger it might help stepping through and watching.

The cake is a lie.
The cake is a lie.
The cake is a lie.

Replies are listed 'Best First'.
Re^2: Non-greedy match end of line bug?
by am12345 (Novice) on Oct 26, 2021 at 01:48 UTC
    Well... I would buy it "by default" or with /m qualifier, but I did add a /s qualifier. And with that I think \n is supposed to be treated just like any other character. Should it not?
    s Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.

      You're not taking into account the non-greediness. To accommodate matching the $ (which is before the newline) $1 holds 'foo'. If you also want to match the terminal newline, use \z instead of $:

      $ perl -e '$_="foo\n"; print "1=|$1|\n" if m/(fo.+?)$/s' 1=|foo| $ perl -e '$_="foo\n"; print "1=|$1|\n" if m/(fo.+?)\z/s' 1=|foo |

      Your comparison with the first one-liner is not comparing apples with apples:

      $ perl -e '$_="foo\nbar"; print "1=|$1|\n" if m/(fo.+?)$/s' 1=|foo bar| $ perl -e '$_="foo\nbar\n"; print "1=|$1|\n" if m/(fo.+?)$/s' 1=|foo bar| $ perl -e '$_="foo\nbar\n"; print "1=|$1|\n" if m/(fo.+?)\z/s' 1=|foo bar |

      I also second ++Fletch's recommendation to use Regexp::Debugger. This allows you to step through the matching process and see exactly what's happening. I often use it myself.

      — Ken

        I get what's going on now, thank you.

        I still think it's a bug, or at the very least a major implementation quirk that is incompatible with other regex implementations. Javascript and Golang treat /s the intuitive way and don't make an exception for \n at the end of a string.

        Type this into any browser console:

        "foo\nbar".match(/(fo.+?)$/s) && RegExp.$1 "foo\n".match(/(fo.+?)$/s) && RegExp.$1

        Or try it on regex101.com - you get different matching results on PCRE vs non-PCRE based engines.

        I think it warrants a big warning in perlre. It was a nasty surprise for me even though I am far from being a perl novice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11138037]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others romping around the Monastery: (3)
As of 2024-04-25 02:22 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found