Downloading a range of sequential files

Ionizor has asked for the wisdom of the Perl Monks concerning the following question:

I've written up this little script that will print a range of sequential URLs based on an input URL. The idea is to generate a list of files suitable for use with wget. I was just wondering if there were any obvious improvements I could make to the regexes I'm using to extract the pathnames (or a module to eliminate them entirely) or just general improvements that could be made to the script.

Any help is much appreciated. Thanks.

#!/usr/bin/perl

require 5.6.1;

use strict;
use warnings;

die "Usage: listseq.pl <url> [min] <max>\n" unless (@ARGV >= 2 and @AR
+GV <= 3);

# Retrieve the URL
my $extract = shift;

# Sort out the range of numbered files we should be getting
my $min;

if (@ARGV >= 2) {
  $min = shift;
} else {
  $min = 1;
}

my $max = shift;

# Separate the path from the filename and save both
$extract =~ /^(\S*\/)(\S*?)$/;
my ($filepath, $filename) = ($1, $2);

# Pull the filename and extension from the file; determine precision (
+1 vs 01)
$filename =~ /^(\S+?)(\d+)(\.\S+)$/;
my ($name, $numlength, $extension) = ($1, length($2), $3);

# Print the list of filenames
for (my $i = $min; $i <= $max; $i++) {
  print ($filepath, $name, (sprintf "%0${numlength}d", $i), $extension
+, "\n");
}
[download]

--
Grant me the wisdom to shut my mouth when I don't know what I'm talking about.

Comment on Downloading a range of sequential files Download Code

Replies are listed 'Best First'.
Re: Downloading a range of sequential files by revdiablo (Prior) on Jul 25, 2003 at 03:03 UTC
I have a script exactly like this. It was one of my Very First Perl Scripts Ever (tm). I called it... wait for it... `urlrange`. :) One thing you might want to add is a way to specify the precision of numbers in the range. Your automagic precision detection will work in most cases, but occasionally it's nice to be able to specify. Also, you might want to make a way to specify a suffix, rather than one being automatically determined. My script takes it's args as `prefix first last suffix`. So if I had `http://blah.com/file01-blah.txt ...`, I would run: `urlrange http://www.blah.com/file0 1 5 -blah.txt`. Hopefully you will find this useful. Update: you can look at my version at my website.	[reply] [d/l] [select]
Re^2: Downloading a range of sequential files by Ionizor (Pilgrim) on Aug 06, 2003 at 22:42 UTC
Specifying precision - good idea. Thanks! As far as suffix being automatically determined, what I had didn't actually work very well since any file that didn't end in a number would cause a script error. I've since added in automatic detection and a `die` to gracefully catch non-numeric filenames. Most of the files I'm listing end in numbers anyway, so it wasn't that big a problem for me. Eventually I'm planning to implement a switch (--activenumber="3" or something like that) that allows me to specify the number to increment / decrement (in case of filenames like "results-2000-01-25-final.html"). One of my design goals with this was to make it as automagic as possible - I'm lazy so I want to be able to just copy and paste the URLs, add parameters to the end and hit enter. -- Grant me the wisdom to shut my mouth when I don't know what I'm talking about.	[reply] [d/l]
Re: Re: Re: Downloading a range of sequential files by revdiablo (Prior) on Aug 06, 2003 at 23:55 UTC
Note that my paramater order was designed to be done as lazily as possible too. Using your example, say I had the url `http://foobar.com/results-2000-01-25-final.html` and I wanted to get 2000-01-25 through 2000-01-28. My sequence of operations would go something like so. First I would type the command: wget `urlrange ` Then paste the url: wget `urlrange http://foobar.com/results-2000-01-25-final.html` And finally all I have to do is move my cursor over to the `25`, add a space before and after, and put the last digit of my range. My result would be something like: wget `urlrange http://foobar.com/results-2000-01- 25 28 -final.html` Which, I would say, is about as lazy as it can get (in terms of number of keystrokes). But... to each his own, I suppose. Your interface seems to work fine for you.	[reply] [d/l] [select]
Re^4: Downloading a range of sequential files by Ionizor (Pilgrim) on Aug 07, 2003 at 02:04 UTC
Re: Downloading a range of sequential files by Nkuvu (Priest) on Jul 25, 2003 at 03:23 UTC
File::Basename may be of use to you.	[reply]
(jeffa) Re: Downloading a range of sequential files by jeffa (Bishop) on Jul 25, 2003 at 15:23 UTC
I don't have time to explain this right now ... but here is a modification of your code that uses plethora of CPAN modules. Hope this helps. :) Read more... (1504 Bytes) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re: (jeffa) Re: Downloading a range of sequential files by Ionizor (Pilgrim) on Jul 25, 2003 at 21:46 UTC
Quite helpful indeed. Thank you! `Getopt::Long;` and `Pod::Usage;` seem like they would be quite useful. -- Grant me the wisdom to shut my mouth when I don't know what I'm talking about.	[reply] [d/l] [select]