tanger has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

Just wondering how would you retrieve the image name from a url into a scalar.

for example if i wanted $image_name to be equal to "image.jpg" from this url:

http://www.made_up_name.com/image.jpg

So lets say if:

$url = "http://www.made_up_name.com/image.jpg";

What would be the regex code I'm looking for? I'm not good with regex much, its hard for me to figure out something like this. when it comes to a simple matching or replacing a word with a word then that I can do :)

thanks!

Replies are listed 'Best First'.
Re: Regex simple quicky question :)
by davido (Cardinal) on Oct 31, 2004 at 03:01 UTC

    Assuming the image name doesn't have any spaces in it, you could do it like this:

    if ( $url =~ m!/([^/]+\.(?:jpe?g|gif|png|tiff?)\b)!i ) { print "Found $1\n"; }

    The way this works is as follows:

    m! # Use an alternate delimiter. / # Match a '/' character to anchor off of. ( # Start a capturing parenth. [^/]+ # Match any number of non-'/' characters. \. # Match the '.' character. (?: # Group or constrain without capturing. jpe?g # jpg or jpeg. | gif # or gif | png # or png | tiff? # or tif or tiff ) # End the grouping/constraining parens. \b # Make sure that tiff isn't tiffany, for # example. ) # End the capture. !ix # Case insensitive /i (Ignore the x, it's only # necessary in this expanded example.

    If the URL is encoded, so that spaces have become '+' characters, etc., you will need to decode the URL first, or else filenames containing spaces will be mangled. That particular issue isn't a regexp issue, just the way URL's get encoded.


    Dave

Re: Regex simple quicky question :)
by atcroft (Abbot) on Oct 31, 2004 at 04:21 UTC

    If you want to avoid regexes, you could use the URI module's path method in combination with the File::Spec::Unix module's splitpath function to get the filename. For example:

    #!/usr/bin/perl -w use strict; use File::Spec::Unix; use URI; # # url: http://www.made_up_name.com/somepath/script.php/image.jpg? +p=1&q=2#name # # $u->scheme: http # $u->userinfo: (undef) # $u->host: www.made_up_name.com # $u->port: 80 # $u->path: /somepath/script.php/image.jpg # $u->query: p=1&q=2 # $u->fragment: name # my $url = join( '://', 'http', join( '/', 'www.made_up_name.com', 'somepath', 'script.php', join( '?', 'image.jpg', join( '#', join( '&', 'p=1', 'q=2' ), 'name' ) ) ) ); my $u = URI->new($url); # # $u->path: /somepath/script.php/image.jpg # # $volume: '' # $directories: /somepath/script.php/ # $file: image.jpg # ( my $volume, my $directories, my $file ) = File::Spec::Unix->splitpath( $u->path ); printf << "URI_PARTS", $url, ( defined( $u->scheme ) ? $u->scheme : '(undef +)' ), ( defined( $u->userinfo ) ? $u->userinfo : '(undef)' ), ( defin +ed( $u->host ) ? $u->host : '(undef)' ), ( defined( $u->port ) ? $u-> +port : '(undef)' ), ( defined( $u->path ) ? $u->path : '(undef)' ), ( + defined( $u->query ) ? $u->query : '(undef)' ), ( defined( $u->fragm +ent ) ? $u->fragment : '(undef)' ); URI PARTS URL: %s SCHEME: %s USERINFO: %s HOST: %s PORT: %s PATH: %s QUERY: %s FRAGMENT: %s URI_PARTS printf << "PATH_PARTS", ( defined($u->path) ? $u->path : '(undef)' ), ( def +ined($volume) ? $volume : '(undef)' ), ( defined($directories) ? $dir +ectories : '(undef)' ), ( defined($file) ? $file : '(undef)' ); PATH FRAGMENTS PATH: %s VOLUME: %s DIRECTORIES: %s FILE: %s PATH_PARTS

    Another way to look at it, at least. Hope that helps.

      You forgot about the path_segments method.
      C:\>perl -MURI -e"die $_= ( URI->new( shift )->path_segments )[-1] " / +somepath/script.php/image.jpg image.jpg

      MJD says "you can't just make shit up and expect the computer to know what you mean, retardo!"
      I run a Win32 PPM repository for perl 5.6.x and 5.8.x -- I take requests (README).
      ** The third rule of perl club is a statement of fact: pod is sexy.

Re: Regex simple quicky question :)
by TheEnigma (Pilgrim) on Oct 31, 2004 at 03:14 UTC
    $url =~ /.+\/(.+)/; $image_name = $1;
    This will make use of the fact that regexes tend to be greedy, and match as much as possible. In the above, the . means anything but a newline, the + means one or more of them, and the \/ means a slash. The \ is needed in front of the slash to "escape" it, so it isn't interpreted as the second delimiter of the regex. So the regex up to that point will match everything up to the last slash. Then the parentheses around the .+ (one or more of anything again), will store what the .+ matches in the $1 variable. Then the next line puts that in $image_name.

    If you wanted to be sure to only match on a JPEG, you could use (.+\.jpg) for the part in parentheses instead.

    You really should check out perlrequick, perlretut, and perlre to learn more about regexes, because I left out a lot in my above explanation. They're not that hard to learn most of the basics, and a lot of fun I think.

    TheEnigma

Re: Regex simple quicky question :)
by TROGDOR (Scribe) on Oct 31, 2004 at 19:48 UTC
    Here's the quick'n'dirty way I grab filenames off of paths:
    $_ = $url; ($filename) = /([^\/]*)$/; print "filename is $filename\n";
    This will grab all characters at the end of the string that are not forward slashes. In this case, "image.jpg". This works for Unix paths and urls, as long as the url doesn't have arguments on the end. It also works nicely for relative paths, even if the file is in the current directory. (i.e. no slashes in the path.)

    TROGDOR
Re: Regex simple quicky question :)
by injunjoel (Priest) on Nov 01, 2004 at 00:02 UTC
    Greetings all,
    #!/usr/bin/perl use strict; use Dumpvalue; my $dumper = new Dumpvalue; my $url_string = "http://www.made_up_name.com/image.jpg there might be + text between as well since an url like so: http://www.made_up_name.c +om/image2.gif says to me this might be needed in a larger context so +lets try with a string http://www.made_up_name.com/image3_the%20third +.png and see what we get."; my @images = map{split/\//;pop @_} $url_string =~ m!(http://\S+/\S+\.[ +a-z]{3,4})!g; $dumper->dumpValues(\@images); exit; OUTPUTS: 0 ARRAY(0x155ac4c) 0 'image.jpg' 1 'image2.gif' 2 'image3_the%20third.png'

    -InjunJoel
    "I do not feel obliged to believe that the same God who endowed us with sense, reason and intellect has intended us to forego their use." -Galileo