Re: Specific instance of a repeated string
by antirice (Priest) on Aug 09, 2003 at 22:00 UTC
|
What is switches supposed to contain beyond the particular group of numbers it should split upon? Anyhow, this will be messy:
#!/usr/bin/perl -wl
use Data::Dumper
sub get_split {
my $filename = shift;
my $switches = shift;
my %filesplit;
my @digits = $filename =~ /(\d+)/g or die "Error: Could not extract
+a number from filename '$filename'.\n";
$filesplit{digit} = $digits[$switches->{numindex}];
# how many $filesplit{digit} can we find before this sucker?
my $splits = 2 + grep($_ eq $filesplit{digit},@digits[0..$switches->
+{numindex}-1]);
my @temp = split (/$filesplit{digit}/, $filename, $splits);
$filesplit{suffix} = pop(@temp);
$filesplit{prefix} = join $filesplit{digit}, @temp;
return \%filesplit;
}
print Dumper(get_split("01-file01.html",{numindex=>1}));
print Dumper(get_split("01-file01and01.html",{numindex=>1}));
print Dumper(get_split("02-file01tom34bill01.html",{numindex=>3}));
__DATA__
$VAR1 = {
'digit' => '01',
'suffix' => '.html',
'prefix' => '01-file'
};
$VAR1 = {
'digit' => '01',
'suffix' => 'and01.html',
'prefix' => '01-file'
};
$VAR1 = {
'digit' => '01',
'suffix' => '.html',
'prefix' => '02-file01tom34bill'
};
That is some ugly code. Hope this helps.
antirice The first rule of Perl club is - use Perl The ith rule of Perl club is - follow rule i - 1 for i > 1 | [reply] [d/l] |
|
|
Switches contains all the command line switch settings:
All switches are optional.
The numeric-index switch should only be used when a filename has multiple
numbers in it, e.g. 01-file01.html. This switch defaults to -1 which is
the last number in the filename. Specifying the index as 1 will force the script
to increment the first set of numbers. Specifying the index as 2 will force
the script to increment the second set of numbers (which is redundant since
the last set of numbers is the default anyway). Again, you get enough rope to
hang yourself so don't use an index higher than the number of numbers
in the file name.
The precision switch controls how many zeros are prepended to 'short' numbers,
i.e. should the first file be file1.html, file01.html, file001.html, etc. For
default values, the script first looks at the precision of min if it's present,
then max. If neither value is specified, the script defaults to the precision
in the input URI, meaning if you use the filename file23.html you'll get
two digits of precision whether you want them or not.
The reverse switch simply prints out the list of URIs in order from max to min
rather than from min to max.
The verbose switch turns on some basic warnings such as the detected precision
and whether or not the min and max values were swapped.
--
Grant me the wisdom to shut my mouth when I don't know what I'm talking about.
| [reply] [d/l] [select] |
Re: Specific instance of a repeated string
by graff (Chancellor) on Aug 09, 2003 at 22:29 UTC
|
| [reply] |
|
|
01-file-01.html
01-file-02.html
01-file-03.html
...
Only one number is ever going to form the basis for the list so the other numbers in a filename should be part of either the prefix or the suffix. I get the feeling I'm not explaining this very well...
--
Grant me the wisdom to shut my mouth when I don't know what I'm talking about. | [reply] [d/l] |
Re: Specific instance of a repeated string
by Ionizor (Pilgrim) on Aug 09, 2003 at 22:43 UTC
|
I was hoping what I was trying to accomplish would be clear without having to post all the code but apparently it's not. Here is the full code for my script, which is a rewrite of the script in this node to create a sequential list of files.
--
Grant me the wisdom to shut my mouth when I don't know what I'm talking about.
| [reply] [d/l] |
|
|
my @a = $filename =~ m/ (\D+)? (\d+)? /xg;
| [reply] [d/l] [select] |
|
|
| [reply] |
|
|
Instead of using a regex followed by a split, try using just a regex.
if ( $filename =~ /(.*)(\d+)\.(.*)$/ )
{
$prefix = $1;
$digit = $2;
$suffix = $3;
}
else
{
print "Error etc\n";
}
You may need to tweak the regex. I am unable to test it from my current location.
| [reply] [d/l] |
|
|
Unfortunately this won't do what I need it to do. This will only work for files that look like foo01.bar. Some of my files look like: foo10bar.baz. This regex also assumes that it's the last number in the filename that I want to operate on which isn't always the case.
Thanks for the suggestion though, it is appreciated.
--
Grant me the wisdom to shut my mouth when I don't know what I'm talking about.
| [reply] [d/l] [select] |
Re: Specific instance of a repeated string
by CombatSquirrel (Hermit) on Aug 10, 2003 at 16:41 UTC
|
If you want to be able to change any number in the file name, you would probably like to have it split into chunks. So far my idea; I realize that shenme
has already written a piece of code utilizing this, but TIMTOWDI, and so I decided to write another piece of code:
#!perl -w
use strict;
for my $name (<DATA>) {
chomp $name;
my %parts;
$name =~ s/(.*)(\.[^.]*)/$1/;
$parts{"suffix"} = $2 or die "Not a valid file name: $name";
$parts{"prefix"} = [];
$parts{"number"} = [];
while ($name =~ s/^(\D*)(\d+)//) {
push @{$parts{"prefix"}}, $1;
push @{$parts{"number"}}, $2;
}
$name and die "Invalid file name format. Rest '$name' remained";
print "Filename splits as follows: ["
. join("][",
map { "(" . $parts{"prefix"}->[$_] . ")("
. $parts{"number"}->[$_] . ")" } 0..@{$parts{"nu
+mber"}}-1)
. "]<"
. $parts{"suffix"} . ">\n";
}
__DATA__
01-html02.html
01-htm23-43.htm
01-file-01.html
The program first extracts the file suffix (file ending after the dot, I hope that I didn't misunderstand you here) and then loops through the file name, taking (possibly) a prefix and (definitely) a number from it and storing it in anonymous arrays in $parts{"prefix"} and $parts{"number"}.
If you want to increment the $ith number now, you would just have to write
++$parts{"number"}->[$i];
$filename = join('',
map { $parts{"prefix"}->[$_]
. $parts{"number"}->[$_] }
0..@{$parts{"number"}}-1)
. $parts{"suffix"};
Hope that helped. | [reply] [d/l] [select] |
|
|
I guess I wasn't clear. The file suffix is the part of the file after the number I'm operating on (for the file foo10bar.baz the suffix would be bar.baz. In most cases the suffix is just the file extension but I'm trying to make the script handle any filename regardless of the position of the digits.
I've gotten some inspiration from your code though, so thanks! I'll let you know how it turns out.
--
Grant me the wisdom to shut my mouth when I don't know what I'm talking about.
| [reply] [d/l] [select] |