in reply to Remote Directory listing

Yes, even you can use CPAN.

perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'

Replies are listed 'Best First'.
Re^2: Remote Directory listing
by davido (Cardinal) on Jul 07, 2012 at 21:56 UTC

    Which module would you suggest for reliably getting the contents of a remote directory via HTTP?

    AFAIK, it is entirely up to the server and the applications running on it to decide how and if a directory is rendered at any given URL. Just because I can get a document at http://someurl.com/documents/mydoc.txt doesn't mean I can get a directory at http://someurl.com/documents/. I might. But I might just get a 403 - Forbidden, 404 - Not Found, or whatever resource the site's author intends to be served specifically at the URL.

    If the OP had said that he can get a directory to render on his browser by entering http://someurl.com/documents/, then we could point him to LWP::Simple. But I think we're missing some information before we can guide him in that direction with any assurance that the advice is going to work for him.


    Dave

      "AFAIK, it is entirely up to the server and the applications running on it to decide how and if a directory is rendered at any given URL. Just because I can get a document at http://someurl.com/documents/mydoc.txt doesn't mean I can get a directory at http://someurl.com/documents/. I might."

      I inferred from his question (where he said, "http - standard apache index") that this was already not a problem. That he has a particular directory in mind which has a known directory listing format.

      "Which module would you suggest for reliably getting the contents of a remote directory via HTTP?"

      Personally I'd use Web::Magic, but I'm biased.

      use 5.010; use strict; use PerlX::MethodCallWithBlock; use Path::Class qw(file dir); use Web::Magic -sub => 'web'; use XML::LibXML 2.0000; my $listing = URI->new('http://buzzword.org.uk/2012/'); my $destination = dir('/home/tai/tmp/downloaded/'); # Make sure destination directory exists. $destination->mkpath; web($listing) # Die if 404 or some other error -> assert_success # Find all the links on the page -> querySelectorAll('a[href]') # Skip uninteresting links -> grep { not ( /Parent Directory/ or $_->{href} =~ m{\?} # has a query or $_->{href} =~ m{/$} # ends in slash ) } # Expand relative URI references to absolute URIs -> map { URI->new_abs($_->{href}, $listing) } # Save each to the destination directory -> foreach { # Figure out name of file to save as my $filename = $destination->file( [$_->path_segments]->[-1] ) +; # Log a message printf STDERR "Saving <%s> to '%s'\n", $_, $filename; # Save it! web($_)->save_as("$filename"); }
      perl -E'sub Monkey::do{say$_,for@_,do{($monkey=[caller(0)]->[3])=~s{::}{ }and$monkey}}"Monkey say"->Monkey::do'