in reply to Regex to match file extension in URL

Since you are only matching a "." and a "/", and you don't particularly care what surrounds them, you don't need a regular expression.

Its a job for good old fashioned substr and rindex

#!/usr/bin/perl -w use strict; my $URK = 'http:/://blah.foo.comwhatever/dir/file.extensionelarocko?qu +ery=blah&fckj=ekjl'; my $last_slash = rindex $URK, '/'; my $last_dot = rindex $URK, '.'; my $query = rindex $URK, '?'; print "$URK\n\n"; if( ($last_slash >= 0) and ($last_dot >= 0) ) { printf "%35.35s: %s\n\n", "looks like the file name is", substr( $URK, $last_slash + 1); printf "%35.35s: %s\n\n", "and the extension is", substr($URK,$last_dot + 1); } else { print "seems like we got index.something on our hands\n\n"; } if($query >= 0) { printf "%35.35s: %s\n\n", "We even got a query string, whoa", substr($URK, $query + 1); printf "%35.35s: %s\n\n", "so the true filename would be", substr ( $URK , $last_slash + 1, $query - $last_slash ); printf "%35.35s: %s\n\n", "and the true file extension would be", substr ( $URK , $last_dot + 1, $query - $last_dot - 1 ); } __END__ =head1 RESULTS http:/://blah.foo.comwhatever/dir/file.extensionelarocko?query=blah&fc +kj=ekjl looks like the file name is: file.extensionelarocko?query=blah +&fckj=ekjl and the extension is: extensionelarocko?query=blah&fckj +=ekjl We even got a query string, whoa: query=blah&fckj=ekjl so the true filename would be: file.extensionelarocko? and the true file extension would b: extensionelarocko =cut
Looks like what you asked for to me.

Also, CGI.pm has regexes that will give you all kinds of good stuff from the query string/request url .... you can either use CGI.pm or steal the code from the module depending on your needs(script_name() path_translated() path_info()).

## AND IF YOU WANNA CHECK FOR A VALID URL, YOU REALLY NEED A MODULE (U +RL::URI) ## BUT substr and rindex are still the best for the job my $url = 'proto://domain.something/dir/file.extension'; my $protocol = substr $url, 0, index($url, '://'),''; ## yada yada yada, you get the point

However, a regular expression might be "easier" to digest, something along the lines (like "others" have already shown)

my $url = 'ptoto://foo.combarz.erk/file.ext?query'; my ($proto, $domain, $filedirquery) = $url =~ m|(\w{2,6})://([.a-zA-Z0-9-]+/)(.*?)$|; print "($proto, $domain, $filedirquery)\n";
update: Thu Sep 13 10:26:58 2001 GMT
demerphq: has some valid points. My regex obviously isn't complete, and my "substr & index" solution, which I say is the way to tackle this, isn't "validating" and doesn't handle all the possible cases, but then again, it doesn't look like it does. I reccommended using CGI.pm ... for simply getting the file extension from a url, ignoring the possibility of a querystring, and assuming that the filename is in the name.extension format, print substr 'file.htm', 1 + rindex 'file.htm', '.'; cannot be beat.

Anyway, lots of good reading in this thread.

 
___crazyinsomniac_______________________________________
Disclaimer: Don't blame. It came from inside the void

perl -e "$q=$_;map({chr unpack qq;H*;,$_}split(q;;,q*H*));print;$q/$q;"