ip9 has asked for the wisdom of the Perl Monks concerning the following question:

Hello, I am currently working on a little task to gather URLs i have used the LinkExtor thingy to get all the A tags from a document, i now wish to pickup just the file name from the end of a URL (e.g. http://www.fubar.com/fileXXX.txt)

i know what that the entension of the file will be, i think i just need an RE to get everything after the last "/" in the URL Help please..

Replies are listed 'Best First'.
Re: Get the filename from the end of a URL
by davorg (Chancellor) on May 24, 2001 at 14:52 UTC
Re: Get the filename from the end of a URL
by Vynce (Friar) on May 24, 2001 at 17:07 UTC

    first: ++davorg for already providing the really real right answer. in perl, there's always MTOWTDI, but some are better than others, and ones other people have already written properly and debugged are often best.

    if you really want a regex to get everything after the last /, i think the easiest way is

    (my $file = $url) =~ s~^.*/~~;
    though maybe you prefer
    my ($file) = $url =~ m~([^/]*)$~;

    i also wanted to point out that in BigJoe's solution it is more idiomatically perl to say $array[-1] than $array[$#array]. this is useful to me because it may not always be clear how to get the last valid index of an array that is, say, stored as the second item in an anonymous array reffed by a hash entry... meaning $hash{stuff}[1]. but &091;-1&093; always works

    .

    update: as per the tachyon beam. no, i didn't test it because at the moment i am away from easy perl access. mea culpa. fixx0red.

    .
      my ($file) = $url =~ m~[^/]*$~; You didn't test this did you? You need to capture: my ($file) = $url =~ m~([^/]*)$~; tachyon
Re: Get the filename from the end of a URL
by BigJoe (Curate) on May 24, 2001 at 16:31 UTC
    $URL = "http://www.your.com/your/url/text.txt"; my @list = split(/\//, $URL); print $list[$#list];


    --BigJoe

    Learn patience, you must.
    Young PerlMonk, craves Not these things.
    Use the source Luke.
Re: Get the filename from the end of a URL
by mr.nick (Chaplain) on May 24, 2001 at 17:13 UTC
    File::Basename works just fine for URLs are well as real filenames:
    use File::Basename; print basename 'http://www.plenty.org/downloads/crap.exe';
    Update: This does break if the url is something like http://www.plenty.org/downloads/crap.exe?foo=/path/to/something. But so do the others example, I believe.
      yesbut, what if the URL is ...?
      http://www.plenty.org/downloads/crap.exe?foo=/path/to/something

      xoxo,
      Andy

      %_=split/;/,".;;n;u;e;ot;t;her;c; ".   #   Andy Lester
      'Perl ;@; a;a;j;m;er;y;t;p;n;d;s;o;'.  #   http://petdance.com
      "hack";print map delete$_{$_},split//,q<   andy@petdance.com   >
      
Re: Get the filename from the end of a URL
by Beatnik (Parson) on May 24, 2001 at 16:26 UTC
    Try this, altho I'd recommend using davorg's solution...
    #!/usr/bin/perl -w use strict; my $url = "http://www.oreilly.com/catalog/lperl3/index.html"; my ($page) = $url =~ /.*\/([^\/]*)/; print $page;
    Greetz
    Beatnik
    ... Quidquid perl dictum sit, altum viditur.
Re: Get the filename from the end of a URL
by tachyon (Chancellor) on May 24, 2001 at 16:33 UTC

    This does what you want

    $url = 'http://www.fubar.com/fileXXX.txt'; # name and extension ($file) = $url =~ m|^.*/(.*)$|; print "$file\n"; # if you want the name without the extension $ext ='txt'; ($file) = $url =~ m|^.*/(.*)\.$ext$|o; print "$file\n";

    Note with the second RE you can hard code in $ext if you want. The o at the end is a compile once directive It make the regex faster but means that if you change $ext on the fly this regex will not care! Just remove the o if you need to do this.

    cheers

    tachyon

Re: Get the filename from the end of a URL
by Big Willy (Scribe) on May 24, 2001 at 16:19 UTC
    This might work:
    $url = 'http://www.fubar.com/fileXXX.txt'; $url =~ m{^[\w\/\:\.]+\/([\w\.]+)$}; $file = $1;