awohld has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to use a regex to get the filename of this URL: http://txs.corp.com:8080/area/es/2215.csv.gz

Why isn't this doing a non-greedy match and getting "2215.csv.gz" in the scalar $gzip?

I know I can use a module for this, but I'm trying to figure out what's wrong with my regex knowledge.
my $file = 'http://txs.corp.com:8080/area/es/2215.csv.gz'; my ( $gzip ) = $file =~ m/\/(.*?\.gz)$/;

Replies are listed 'Best First'.
Re: Why is this greedy matching?
by toolic (Bishop) on Mar 23, 2012 at 14:41 UTC
    File::Basename is easier:
    use warnings; use strict; use File::Basename; my $file = 'http://txs.corp.com:8080/area/es/2215.csv.gz'; my $gzip = basename($file, qr/\.gz/); print "$gzip\n"; __END__ 2215.csv.gz

    Demystify regular expressions by installing and using the CPAN module YAPE::Regex::Explain (Tip #9 from Basic debugging checklist)

    The regular expression: (?-imsx:/(.*?\.gz)$) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- / '/' ---------------------------------------------------------------------- ( group and capture to \1: ---------------------------------------------------------------------- .*? any character except \n (0 or more times (matching the least amount possible)) ---------------------------------------------------------------------- \. '.' ---------------------------------------------------------------------- gz 'gz' ---------------------------------------------------------------------- ) end of \1 ---------------------------------------------------------------------- $ before an optional \n, and the end of the string ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
Re: Why is this greedy matching?
by BrowserUk (Patriarch) on Mar 23, 2012 at 15:10 UTC

    If you don't want to capture any slashes, say that!:

    print 'http://txs.corp.com:8080/area/es/2215.csv.gz' =~ m[ ( [^/]+ $ ) + ]x;; 2215.csv.gz

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

Re: Why is this greedy matching? (leftmost)
by tye (Sage) on Mar 23, 2012 at 14:48 UTC

    "First match" (leftmost) trumps "shortest match" (anti-greedy). You can use a greedy .* to replace "leftmost (sub)match" with "rightmost (sub)match" to end up with "shortest match" in cases like this:

    m{.*/(.*\.gz)$}

    Note that it doesn't matter whether you make the part inside the parens greedy or not here.

    - tye        

Re: Why is this greedy matching?
by ww (Archbishop) on Mar 23, 2012 at 14:35 UTC
    ...perhaps because the deathstar in your capture matches everything after the second "/" in "http://"
Re: Why is this greedy matching?
by JavaFan (Canon) on Mar 23, 2012 at 20:29 UTC
    Left most trumps non-greedy.