Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I need to fetch the last part of a URL.

http:\\myurlinfo\mypage.html

How would I write a reg expression to fetch just the actual html page name such as mypage.html
myfetch =~ /http:\\\w\w\$1/;
My attempt not working. Please advise.

Replies are listed 'Best First'.
Re: Fetching last part of url
by ikegami (Patriarch) on Sep 09, 2004 at 17:03 UTC

    This is probably the safest way:

    use URI; $uri = URI->new('http://domain.com/myurlinfo/mypage.html'); print(($uri->path_segments())[-1], "\n"); # prints "mypage.html"

    Note that the slashes in the example URL you provided are leaning the wrong way and it doesn't have a domain even though it has a scheme. Mistakes like these and subtler ones are the reason it's usually better to use modules for parsing. The people who wrote the modules payed attention to the specs and the various possible formats.

Re: Fetching last part of url
by Limbic~Region (Chancellor) on Sep 09, 2004 at 16:59 UTC
    Anonymous Monk,
    #!/usr/bin/perl use strict; use warnings; use URI; my $uri = URI->new( 'http://www.foo.com/asdf/blah/bar.html' ); print +($uri->path_segments)[-1];

    Cheers - L~R

Re:Fetching last part of url
by borisz (Canon) on Sep 09, 2004 at 16:59 UTC
    my $page; $something =~ m!([^/!]*$)! and $page = $1;
    Update: fix typo: from  m!([^/!]*$)!to  m!([^/]*$)!.
    Boris
      #!perl use strict; use warnings; my $myfetch = 'http:\\myurlinfo\mypage.html'; $myfetch =~ m!([^/!]*$)! ? print $1 : print 'Whoops...'; __END__ Unmatched [ in regex; marked by <-- HERE in m/([ <-- HERE ^// at (eval + 1) line 7.
      Probably you meant
      #!perl use strict; use warnings; my $myfetch = 'http:\\myurlinfo\mypage.html'; $myfetch =~ m!([^\\]+)$! ? print $1 : print 'Whoops...'; __END__ mypage.html

      Cheers,
      CombatSquirrel.

      Entropy is the tendency of everything going to hell.
        Oops, no I mean m!([^/]*$)! sorry.
        Boris
Re: Fetching last part of url
by TheEnigma (Pilgrim) on Sep 09, 2004 at 16:59 UTC
    Your trying to make it too complicated. Just make use of the greedy nature of matches and get all the way to the last '\' (no matter how many there are) like this:

    myfetch =~ /.+\\(.+)/;

    Also note that the $1 doesn't go in the match itself; you would use it later, perhaps by assigning it to a variable.

    Update: added the first .+ to the regex

    TheEnigma

Re: Fetching last part of url
by Eimi Metamorphoumai (Deacon) on Sep 09, 2004 at 17:01 UTC
    $url = 'http:\\myurlinfo\mypage.html'; ($myfetch) = $url =~ /([^\\\/]+)$/;
Re: Fetching last part of url
by jbware (Chaplain) on Sep 09, 2004 at 17:09 UTC
    I'd probably recommend something like:
    ($pagename) = $myfetch =~ /http:\\\\.*\\([^\\]*)$/;
    -jbWare
      Thanks for all your quick replies!