cosmicperl has asked for the wisdom of the Perl Monks concerning the following question:

Hi Guys,
   I have a regular expression problem. Currently I'm doing it a long winded way:-

$systempath = "$ENV{'PATH_TRANSLATED'}"; $systempath =~ s/(\\[a-z0-9]*\.cgi)$//g; if ($&) {$extention = $&;} ## End if $systempath =~ s/(\\[a-z0-9]*\.pl)$//g; if ($&) {$extention = $&;} ## End if $systempath =~ s/(\\[a-z0-9]*\.asp)$//g; if ($&) {$extention = $&;} ## End if $extention =~ /\./; $extention = $';

I'm trying to create 2 variables from the $ENV{'PATH_TRANSLATED'}. One for the system path to the scripts folder. Another for the extention of the script, be it .cgi, .pl or .asp (I'm dabling with PerlScript for ASP as well).

As this is at the top of my scripts I'd like to minimise it. The full chunk is:-

BEGIN { if (($^O eq 'MSWin32') || defined($ENV{'OS'})) { ##### Get ENV unless ($ENV{'PATH_TRANSLATED'} || $ENV{'SCRIPT_FILENAME'}) { $aspmode = 1; $ENV{'PATH_TRANSLATED'} = $Request->ServerVariables('PATH_TRANSL +ATED')->item; $ENV{'SCRIPT_FILENAME'} = $Request->ServerVariables('SCRIPT_FILE +NAME')->item; } ## End unless $operatingsystem = 0; $osstring = "Win32 - NT, 2000, 2003"; $operatingsystemoldnt = 0; $systempath = "$ENV{'PATH_TRANSLATED'}"; unless ($systempath) { $systempath = "$ENV{'SCRIPT_FILENAME'}"; } ## End unless $systempath =~ s/(\\[a-z0-9]*\.cgi)$//g; if ($&) {$extention = $&;} ## End if $systempath =~ s/(\\[a-z0-9]*\.pl)$//g; if ($&) {$extention = $&;} ## End if $systempath =~ s/(\\[a-z0-9]*\.asp)$//g; if ($&) {$extention = $&;} ## End if $extention =~ /\./; $extention = $'; # $operatingsystemoldnt = 1; # $slash = '\\'; $slash = '/'; } ## End if else { $operatingsystem = 1; $osstring = "Unix - Linux"; $systempath = "$ENV{'SCRIPT_FILENAME'}"; $systempath =~ s/(\/[a-z0-9]*\.cgi)$//g; if ($systempath =~ /cgiwrap/) { $systempath = "$ENV{'PATH_TRANSLATED'}"; $systempath =~ s/(\/[a-z0-9]*\.cgi)$//g; } ## End if $slash = '/'; } ## End else ## $systempath = "systempath to your folder"; ## Enter the correct val +ue and un-comment this if you are having system path detection proble +ms push (@INC, "$systempath"); } ## End BEGIN

In case you can give me any other tips of minimising or improving.

Thanks!

janitored by ybiC: Balanced <code> tags around regex example

Replies are listed 'Best First'.
Re: Quick REGEXP question
by jimbojones (Friar) on Oct 31, 2004 at 01:51 UTC
    Hi

    File::Spec with the splitpath method can help you determine file paths. This doesn't give the extension, but you can find it yourself pretty easily. Or you can do a quick and dirty split on "/" to get a path and filename. You could do either:
    use File::Spec; use warnings; foreach my $systempath ( <DATA> ) { chomp $systempath; print "-"x50, "\nPATH: $systempath\n\n"; my ($volume,$directories,$file) = File::Spec->splitpath( $systempath + ); #-- regex with replacement to get the extension my $extension = ""; if ( $file =~ s/(\.[^\.]*)\s*$// ) { $extension = $1; } print "File::Spec way\n"; print "Vol\t", $volume, "\n"; print "Dir\t",$directories, "\n"; print "File\t", $file, "\n"; print "Ext\t", $extension, "\n"; } __DATA__ C:\program files\temp\abc.exe /usr/local/bin/abc C:/this/is/a path/only/ /is/this/a/path/or/a/file /this/is/a/path/
    Some further suggestions.
    • Don't use $`, $&, etc. See 673. Use the $1, $2, instead
    • Use s/// . For example: $extension =~ s/^\.// will remove the leading "." from extension
    - jim
Re: Quick REGEXP question
by ysth (Canon) on Oct 31, 2004 at 03:00 UTC
    File::Basename is the core module for splitting up the folder, base filename, and extension. It defaults to interpreting based on $^O, but can also interpret foreign os paths.
Re: Quick REGEXP question
by graff (Chancellor) on Oct 31, 2004 at 03:56 UTC
    Definitely follow up with the File::Basename because it is handy. As for handling different alternative extensions in a single regex, something like this would also work:
    my ( $ext ) = ( $systempath =~ m{[\\/][a-z0-9]+\.(cgi|pl|asp)$} );
    The parens around $ext turn the left-hand-side into a list context, and the parens within the regex will capture any of the three alternative strings and return the capture as a list (which then gets assigned to $ext).
Re: Quick REGEXP question
by TedPride (Priest) on Oct 31, 2004 at 14:57 UTC
    Something like this?
    my ($spath, $ext) = $ENV{'PATH_TRANSLATED'} =~ /^(.*\/).*\.(.*)$/;
Re: Quick REGEXP question
by cosmicperl (Chaplain) on Nov 01, 2004 at 21:06 UTC
    Thanks guys. Appreciate the support. I guess I should get round to reading my mastering regular expressions book but it's rather daunting! The reason why I use $& is that I had a customer not so long back where for some unknow reason $1, $2, etc were empty, even when there was a match, when I changed to $& it worked. I think he had a very early release of Perl v5.
      Hi

      Check out
      perldoc perlretut
      for a great intro to regular expressions.

      -jim