in reply to help with lazy matching

Think about the regex from left to right. It will match on the first slash, then you tell it to match any characters, and then it must match end-of-string/line. So from the regex engine's point of view, it's completed the match - indeed, you can see this if you run this from the command line: "perl -Mre=debug -wMstrict -le '"/foo/bar/baz/bat"=~/\/(.+?)$/; print $1'". The quickest fix I can think of off the top of my head is to change your dot (.) to [^\/].

The ? would be applicable in the case when your regex wasn't anchored to the end of the string, for example:

$ perl -wMstrict -le '"/foo/bar/baz/bat"=~/\/(.+)\//; print $1' foo/bar/baz $ perl -wMstrict -le '"/foo/bar/baz/bat"=~/\/(.+?)\//; print $1' foo

Also: Is /foo/bar/baz/bat supposed to be a filename? Because if yes, I would really strongly recommend that you use fileparse from File::Basename; there are a few other possible modules but this one is in the core so it should always be available. For example:

use File::Basename 'fileparse'; my $filename = fileparse("/foo/bar/baz/bat"); print "$filename\n"; __END__ bat

And by the way, I think the ? is more commonly referred to as making the expression "non-greedy".

Replies are listed 'Best First'.
Re^2: help with lazy matching
by nlwhittle (Beadle) on Jan 05, 2015 at 22:19 UTC

    ++ on this comment, except that I don't think you need to escape the slash inside the negated character class. You can just write [^/] .

    Also (for the original post), you can leave out the '$_ =~' from your if statement if you want. Since there is no explicit variable in your while loop test, the if statement will match $_ by default.

    --Nick

      You need to escape the forward-slash only if this character is used as the regex delimiter character.

      c:\@Work\Perl\monks>perl -wMstrict -le "$_ = '/foo/bar/baz/bat'; print qq{'$1'} if /([^/]+)$/; " Unmatched [ in regex; marked by <-- HERE in m/([ <-- HERE ^/ at ... c:\@Work\Perl\monks>perl -wMstrict -le "$_ = '/foo/bar/baz/bat'; print qq{'$1'} if m{([^/]+)$}; " 'bat'


      Give a man a fish:  <%-(-(-(-<

Re^2: help with lazy matching
by Special_K (Pilgrim) on Jan 05, 2015 at 22:17 UTC

    I guess my thinking was that with a non-greedy modifier, my regular expression could use the slash before "bat" to match the slash, then it would match "bat" as the .+, and then finally it would match the end of line character in the file as the $.

    Why does it not work that way?

      The non-greedy modifier simply means "match as little as possible while still getting a successful match". All regex matches in Perl Compatible Regular Expressions always match leftmost first; in your case the first slash. Where the non-greedy operator might have worked, for example, is if you wanted to only match 'foo'. Then you could write:

      if ( /\/(.+?)\// )

      This will match the first slash, then non-greedily match any other characters until another slash is reached. If you didn't use the non-greedy modifier here, you would match everything between the first and last slash (i.e. 'foo/bar/baz').

      --Nick
        I think the source of my confusion was not knowing that regular expressions in perl always start matching from the left side. If the regular expression could start matching from anywhere, then using the non-greedy modifier could give the behavior I was expecting in my original post, i.e. matching "bar".

      I like the description in the Camel:

      ... regular expressions will try to match as early as possible. This even takes precedence over being greedy. Since scanning happens left to right, the pattern will match as far left as possible, even if there is some other place where it could match longer. (Regular expressions may be greedy, but they aren’t into delayed gratification.) ...

      (copied from the free sample material on the O'Reilly website, http://cdn.oreillystatic.com/oreilly/booksamplers/9780596004927_sampler.pdf, book page 44)

      Another key thing to realize is that the $ does not change the behavior to scanning from right-to-left.

      Why does it not work that way?

      the regex metacharacter dot (.) means match any character ( except newline or including newline)

      it starts to match after the first / is matched and it matches all subsequent /

      This is a FAQ but hard to search for FAQ :)

      use re 'debug'; and watch it work

      use rxrx and watch it work