in reply to Re: Remote Directory listing
in thread Remote Directory listing

Which module would you suggest for reliably getting the contents of a remote directory via HTTP?

AFAIK, it is entirely up to the server and the applications running on it to decide how and if a directory is rendered at any given URL. Just because I can get a document at http://someurl.com/documents/mydoc.txt doesn't mean I can get a directory at http://someurl.com/documents/. I might. But I might just get a 403 - Forbidden, 404 - Not Found, or whatever resource the site's author intends to be served specifically at the URL.

If the OP had said that he can get a directory to render on his browser by entering http://someurl.com/documents/, then we could point him to LWP::Simple. But I think we're missing some information before we can guide him in that direction with any assurance that the advice is going to work for him.


Dave

Replies are listed 'Best First'.
Re^3: Remote Directory listing
by tobyink (Canon) on Jul 08, 2012 at 07:53 UTC

    "AFAIK, it is entirely up to the server and the applications running on it to decide how and if a directory is rendered at any given URL. Just because I can get a document at http://someurl.com/documents/mydoc.txt doesn't mean I can get a directory at http://someurl.com/documents/. I might."

    I inferred from his question (where he said, "http - standard apache index") that this was already not a problem. That he has a particular directory in mind which has a known directory listing format.

    "Which module would you suggest for reliably getting the contents of a remote directory via HTTP?"

    Personally I'd use Web::Magic, but I'm biased.

    use 5.010; use strict; use PerlX::MethodCallWithBlock; use Path::Class qw(file dir); use Web::Magic -sub => 'web'; use XML::LibXML 2.0000; my $listing = URI->new('http://buzzword.org.uk/2012/'); my $destination = dir('/home/tai/tmp/downloaded/'); # Make sure destination directory exists. $destination->mkpath; web($listing) # Die if 404 or some other error -> assert_success # Find all the links on the page -> querySelectorAll('a[href]') # Skip uninteresting links -> grep { not ( /Parent Directory/ or $_->{href} =~ m{\?} # has a query or $_->{href} =~ m{/$} # ends in slash ) } # Expand relative URI references to absolute URIs -> map { URI->new_abs($_->{href}, $listing) } # Save each to the destination directory -> foreach { # Figure out name of file to save as my $filename = $destination->file( [$_->path_segments]->[-1] ) +; # Log a message printf STDERR "Saving <%s> to '%s'\n", $_, $filename; # Save it! web($_)->save_as("$filename"); }
    perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'