in reply to Re^2: Remote Directory listing
in thread Remote Directory listing
"AFAIK, it is entirely up to the server and the applications running on it to decide how and if a directory is rendered at any given URL. Just because I can get a document at http://someurl.com/documents/mydoc.txt doesn't mean I can get a directory at http://someurl.com/documents/. I might."
I inferred from his question (where he said, "http - standard apache index") that this was already not a problem. That he has a particular directory in mind which has a known directory listing format.
"Which module would you suggest for reliably getting the contents of a remote directory via HTTP?"
Personally I'd use Web::Magic, but I'm biased.
use 5.010; use strict; use PerlX::MethodCallWithBlock; use Path::Class qw(file dir); use Web::Magic -sub => 'web'; use XML::LibXML 2.0000; my $listing = URI->new('http://buzzword.org.uk/2012/'); my $destination = dir('/home/tai/tmp/downloaded/'); # Make sure destination directory exists. $destination->mkpath; web($listing) # Die if 404 or some other error -> assert_success # Find all the links on the page -> querySelectorAll('a[href]') # Skip uninteresting links -> grep { not ( /Parent Directory/ or $_->{href} =~ m{\?} # has a query or $_->{href} =~ m{/$} # ends in slash ) } # Expand relative URI references to absolute URIs -> map { URI->new_abs($_->{href}, $listing) } # Save each to the destination directory -> foreach { # Figure out name of file to save as my $filename = $destination->file( [$_->path_segments]->[-1] ) +; # Log a message printf STDERR "Saving <%s> to '%s'\n", $_, $filename; # Save it! web($_)->save_as("$filename"); }
|
|---|