Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hi, I've got $filename, I wanna find out the extension.
How do I regex my $filename as follows:
1. Begin search from the end of the string.
2. Search upto the n'th ocurrence of 'a_string'
Re: Extract string from rear of string
by grinder (Bishop) on Dec 28, 2001 at 02:35 UTC
|
You can use rindex to find the index of the rightmost character of string, which you could then modify with substr, but what you really want is File::Basename, it's part of the standard distribution.
update: Indeed, Aigherach is right about the the fact that rindex takes a substring, but that is not germane to the discussion. If you're trying to identify the extension, that probably means you're looking for the rightmost dot in a string.
substr( $s, 0, rindex( $s, '.' )); # the part before the dot
substr( $s, rindex( $s, '.' ) + 1); # the part after the dot
But that has portability considerations, and as such is dealt with by fileparse from the File::Basename module.
--g r i n d e r
just another bofh
print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u';
| [reply] [d/l] |
|
Actually, it returns the position of the rightmost
occurrence of a substring rather than a character. Though, you could use it to check for a substring that is only one character.
rindex STR,SUBSTR,POSITION
rindex STR,SUBSTR
Works just like index() except that it returns the
position of the LAST occurrence of SUBSTR in STR.
If POSITION is specified, returns the last occur
rence at or before that position.
-- Snazzy tagline here
| [reply] |
|
$ext = (split /\./, $fname)[-1];
But neither take note of the fact that the file might not have an extention... | [reply] [d/l] |
|
..or more than one dot... like linux-2.4.17.tar.gz
| [reply] [d/l] |
|
Re: Extract string from rear of string
by talexb (Chancellor) on Dec 28, 2001 at 08:36 UTC
|
$FileName = "/var/log/apache/error.log";
$FileName =~ m/\.(\w*)$/;
$Extension = $1;
should do the trick. In this code fragment I'm looking for a real period, followed by any number of word characters, followed by the end of the string. I put that in brackets so that I can grab it as $1 later on.
This of course makes the assumption that your definition of a file extension is the group of characters to the right of the last period in a file name .. so the file extension of "foo.bar.baz" would be "baz". Your code should also handle the situation where there are no periods in the filename.
ps I highly recommend reading Programming Perl (Third Edition) by Wall, Christiansen & Orwant, published by O'Reilly.
"Excellent. Release the hounds." -- Monty Burns.
| [reply] [d/l] |
|
Your really should not use \w as many file systems will allow characters other than alpha-numerics. For instance 'file.$#@%' would be a valid name but would break your code.
| [reply] |
Re: Extract string from rear of string
by dmmiller2k (Chaplain) on Dec 28, 2001 at 03:30 UTC
|
| [reply] |
|
That's rather amusing... By following your link I actually get on
to another post that also mentions this particular thread
and so i get stuck in an infinite loop ;D.
But, yeah, actually, is does sort of seem like someone's
trying to get his homework question done by exploiting
monks' sense of selflesness. What you think? In general,
I'm against students who practice this. Unless you figure this
out on your own, there's no way you'd be able to learn.
It's absolutely OK when you get stuck on some obnoxious bugs
or what not (like I always do ;)... however, the question this
anonymous monk has posted seems more like a homework question
from some nightmarish Perl class hehe. Therefore, I would be
hesitant to help... sorry.
"There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
|
| [reply] |
|
| [reply] |
Re: Extract string from rear of string
by ppg (Initiate) on Dec 28, 2001 at 16:05 UTC
|
I think there's a File::Basename module that works similar to the unix basename command which should be able to return the extension somehow, but my books a bit hazzy on how it works. | [reply] |
Re: Extract string from rear of string
by thunders (Priest) on Dec 29, 2001 at 02:52 UTC
|
Update: Juerd is correct in his post below. I made a mistake in my post regarding the split operator. I have corrected this.
Here is an example of File::Basename, applied to the current directory. This is not perfect, but should get you started.
#/usr/bin/perl -w
use File::Basename;
use strict;
my @files = <*>;
foreach my $file (@files) {
my ($name,$dir,$type) = fileparse($file,'\..*');
print sprintf("file= %30s", $name), sprintf(" ext= %10s", $type),
+"\n";
}
you can refine this by crafting a better regex as the second argument to fileparse. This module is overkill unless you are dealing with full filepaths. for a single directory you could just use
my($filename,$ext) = split(/\./,$file);
or something similar. | [reply] [d/l] [select] |
|
print sprintf("file= %30s", $name), sprintf(" ext= %10s", $type),"\n";
Personally, I'd use:
printf "file = %30s ext= %10s\n", $name, $type;
my($filename,$ext) = split('\..*',$file);
While this is valid syntax, using a string as split's first argument might be confusing to beginners.
Every string that is not a single space (\x20) is interpreted as a regex. Using slashes or another m// makes your intention clear.
my ($filename, $ext) = split /\..*/, $file;
Splitting on /\..*/ would return ('foo', undef) for 'foo.bar'.
Splitting on /\./ would probably fix this, but you don't want ('foo', 'bar', 'baz') or (using a limit) ('foo', 'bar.baz').
So using a regex without split would probably be best:
my ($filename, $ext) = $file =~ /^(.+)(?:\.(.*))?$/s
(The first .+ will grab as much as it can, because it is greedy. The /s was added just in case someone has a linefeed in his filename, the anchors are there just to clarify the code, they don't serve a real. I used .+ for dotfiles (filenames beginning with a dot are hidden files in *nix). The extention part is optional ( (?:)? ) because not all files have an extention.)
2;0 juerd@ouranos:~$ perl -e'undef christmas'
Segmentation fault
2;139 juerd@ouranos:~$
| [reply] [d/l] [select] |
|
|