Since you are only matching a "." and a "/", and you don't particularly care what surrounds them, you don't need a regular expression.

Its a job for good old fashioned substr and rindex

#!/usr/bin/perl -w use strict; my $URK = 'http:/://blah.foo.comwhatever/dir/file.extensionelarocko?qu +ery=blah&fckj=ekjl'; my $last_slash = rindex $URK, '/'; my $last_dot = rindex $URK, '.'; my $query = rindex $URK, '?'; print "$URK\n\n"; if( ($last_slash >= 0) and ($last_dot >= 0) ) { printf "%35.35s: %s\n\n", "looks like the file name is", substr( $URK, $last_slash + 1); printf "%35.35s: %s\n\n", "and the extension is", substr($URK,$last_dot + 1); } else { print "seems like we got index.something on our hands\n\n"; } if($query >= 0) { printf "%35.35s: %s\n\n", "We even got a query string, whoa", substr($URK, $query + 1); printf "%35.35s: %s\n\n", "so the true filename would be", substr ( $URK , $last_slash + 1, $query - $last_slash ); printf "%35.35s: %s\n\n", "and the true file extension would be", substr ( $URK , $last_dot + 1, $query - $last_dot - 1 ); } __END__ =head1 RESULTS http:/://blah.foo.comwhatever/dir/file.extensionelarocko?query=blah&fc +kj=ekjl looks like the file name is: file.extensionelarocko?query=blah +&fckj=ekjl and the extension is: extensionelarocko?query=blah&fckj +=ekjl We even got a query string, whoa: query=blah&fckj=ekjl so the true filename would be: file.extensionelarocko? and the true file extension would b: extensionelarocko =cut
Looks like what you asked for to me.

Also, CGI.pm has regexes that will give you all kinds of good stuff from the query string/request url .... you can either use CGI.pm or steal the code from the module depending on your needs(script_name() path_translated() path_info()).

## AND IF YOU WANNA CHECK FOR A VALID URL, YOU REALLY NEED A MODULE (U +RL::URI) ## BUT substr and rindex are still the best for the job my $url = 'proto://domain.something/dir/file.extension'; my $protocol = substr $url, 0, index($url, '://'),''; ## yada yada yada, you get the point

However, a regular expression might be "easier" to digest, something along the lines (like "others" have already shown)

my $url = 'ptoto://foo.combarz.erk/file.ext?query'; my ($proto, $domain, $filedirquery) = $url =~ m|(\w{2,6})://([.a-zA-Z0-9-]+/)(.*?)$|; print "($proto, $domain, $filedirquery)\n";
update: Thu Sep 13 10:26:58 2001 GMT
demerphq: has some valid points. My regex obviously isn't complete, and my "substr & index" solution, which I say is the way to tackle this, isn't "validating" and doesn't handle all the possible cases, but then again, it doesn't look like it does. I reccommended using CGI.pm ... for simply getting the file extension from a url, ignoring the possibility of a querystring, and assuming that the filename is in the name.extension format, print substr 'file.htm', 1 + rindex 'file.htm', '.'; cannot be beat.

Anyway, lots of good reading in this thread.

 
___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void

perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"


In reply to (crazyinsomniac) Re: Regex to match file extension in URL by crazyinsomniac
in thread Regex to match file extension in URL by Amoe

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.