rmckillen has asked for the wisdom of the Perl Monks concerning the following question:

Can LWP take the URL of a directory and retrieve a list of all files in the directory, or even better, download all files in the directory? How is this done?

Replies are listed 'Best First'.
Re: LWP capabilities
by Beatnik (Parson) on May 27, 2001 at 00:58 UTC
Re: LWP capabilities
by jorg (Friar) on May 27, 2001 at 17:54 UTC
    Most linux distributions come with a tool called Wget. This allows you to download an entire site from a given URL. The restrictions that beatnik mentioned apply here as well though.

    Jorg

    "Do or do not, there is no try" -- Yoda
      PLEASE IGNORE MY ABOVE POST! I DID NOT FORMAT PROPERLY!

      I'd never heard of Wget before, it's a neat little tool. I couldn't get it to do exactly what I wanted. I'm hoping this is due to me passing the wrong parameters, but it probably has to do with limiatations placed on Wget by the remote web server. Let me set up the scenario:

      http://www.url.com/baseball/
      The "baseball" folder contains files:
      - index.html
      - picture.gif
      - page2.html
      - (folder also contains other files)

      Contained in the index.html file are references to picture.gif and page2.html. The index.html does not reference the other files... I don't know the names of these files, but I know they are there. When I run:

      wget -r -l1 --no-parent http://www.url.com/baseball/

      It will retrieve index.html, picture.gif, and page2.html, but not the other files that I know are present in the directory.

      How do I get Wget to retrieve the other files not referenced in index.html? Is it possible?

        If there are no links to documents, there is no way of checking if they exist (besides the actual guessing, which can take forever...). On the wget note, I quote :

        Basically it comes down to: if the webserver has dirlisting enabled and no index file, you can see the files in the directory. If those are accessible depends on several factors... (me on LWP)

        and:

        The restrictions that Beatnik mentioned apply here as well though. (jorg on wget)

        Greetz
        Beatnik
        ... Quidquid perl dictum sit, altum viditur.