I think you could improve clarity by either renaming request_page to something more descriptive (as it does more than request the page) or by moving the call to archive_page into the main logic so that you get a more immediate idea of script flow...
my $url = 'http://www.perlmonks.org/'; my $content = request_page($url) or die('...'); my $archive = archive_page($url, $content) or die('...');
I also echo snowcrash's suggestion to use File::Path and LWP::Simple, but while they're powerful tools, they do have some nuances to be aware of; namely mirror and getstore return a status code of 500 if they can't write to their target file, while tt>mkpath propagates it's errors via die (so you might want to wrap it in an eval).
I'd also recommend using URI which can simplify your URL parsing and ensure that you're protocol agnostic (as $url =~ s#^http://##; will break if you mirror a non-http:// URL).
Actually, rolling all three modules together, you can end up with something much simpler; although perhaps still not as readable as one would like. :-)
#!/usr/bin/perl use File::Path qw/ mkpath /; use LWP::Simple qw/ getstore is_success $ua /; use URI; use strict; use warnings; # ---------------------------------------------------- # my %config = ( 'agent' => 'cjf/0.0.1', 'archive' => 'archive', ); # ---------------------------------------------------- # my $url = shift || 'http://www.perlmonks.org/'; # :-) # Convert $url into a file path. my $file = join '/', $config{'archive'}, url_as_path($url); die "Couldn't parse $url\n" if $file eq $config{'archive'}; # Determine and create the directory (and any # missing directories above) that $file will # be saved to. Errors get propagated via die(). my($path) = $file =~ m<(.*)/[^/]+$>; mkpath( [$path], 0, 0755 ); # Configure our 'browser', and fetch/archive the # requested URL. $ua->agent( $config{'agent'} ); my $rc = getstore( $url => $file ); die "Couldn't archive $url: $rc\n" unless is_success($rc); # ---------------------------------------------------- # sub url_as_path { # Convert an URI to a Unix-style path, sans # any protocol my $url = URI->new(shift); (my $path = $url->host) =~ tr[.][/]; $path .= $url->path; # Assign default name if $url appears to be a # directory instead of a file. $path .= 'index.html' if substr( $url->path, -1 ) eq "/" ; $path; }
--k.
In reply to Re: Storing data in a directory hierarchy
by Kanji
in thread Storing data in a directory hierarchy
by cjf
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |