Cody Pendant has asked for the wisdom of the Perl Monks concerning the following question:

I have to copy a rather large website to HD.

There are millions of files, but the good news is, I don't have to copy the files, just the folder structure, to a mirror.

The bad news is, I can't figure out how to do this on Windows. There are FTP utilities on Linux which would do it, for instance ftpcopy seems to have this option, but nothing for Windows.

So, could I do this with Net::FTP::Recursive? It can recurse through a whole remote structure, copying only files which match a regex, but a regex can't match a directory, I need something like -d instead.

Any suggestions gratefully received.



($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
=~y~b-v~a-z~s; print

Replies are listed 'Best First'.
Re: FTP Get Directory Tree Only
by randyk (Parson) on Aug 15, 2006 at 02:25 UTC
    If you have access to the server, or know someone who does, one solution would be to generate a ls-lR.gz file, listing all the files below the desired root, and then grab and parse this file. For this, File::Listing would be handy.
      If that's an acceptable solution, then you might also try the command:
      find . -type d
      which will give you a list of directories in a more convenient form, e.g.:
      . ./.cpan ./.cpan/build ./.cpan/build/Authen-SASL-2.10 ./.cpan/build/Authen-SASL-2.10/blib ./.cpan/build/Authen-SASL-2.10/blib/arch ./.cpan/build/Authen-SASL-2.10/blib/arch/auto ./.cpan/build/Authen-SASL-2.10/blib/arch/auto/Authen ./.cpan/build/Authen-SASL-2.10/blib/arch/auto/Authen/SASL ./.cpan/build/Authen-SASL-2.10/blib/lib ./.cpan/build/Authen-SASL-2.10/blib/lib/Authen ./.cpan/build/Authen-SASL-2.10/blib/lib/Authen/SASL ./.cpan/build/Authen-SASL-2.10/blib/lib/Authen/SASL/Perl ./.cpan/build/Authen-SASL-2.10/blib/lib/auto ./.cpan/build/Authen-SASL-2.10/blib/lib/auto/Authen ./.cpan/build/Authen-SASL-2.10/blib/lib/auto/Authen/SASL . . . etcetera . . .
      --roboticus
      Thanks for that. Great idea.


      ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
      =~y~b-v~a-z~s; print
Re: FTP Get Directory Tree Only
by Cody Pendant (Prior) on Aug 15, 2006 at 02:55 UTC
    I have arrived at a solution, unfortunately a rather hacky one.

    I used wget with a long long list of "rejected" files:

    wget -R htm,txt,inc,ram,css,asx,jpg,gif,html,doc,xml,pdf,mp3,log,rtf,[etc] -r -nH --level=0 --force-directories ftp://<username>:<password>@<server>/<path>/
    (the actual code was all on one line )

    and then I used File::Find locally to go through and remove any files which accidentally made it through.

    Of course any more interesting perl solutions would still be welcome. This could happen again...



    ($_='kkvvttuu bbooppuuiiffss qqffssmm iibbddllffss')
    =~y~b-v~a-z~s; print
      Two other ideas:
      1) I haven't tried it, but while reviewing the man page for wget, I noticed the option --delete-after. Perhaps it would leave the directory structure behind after getting and deleting the local files.
      2) A while ago, i wrote a program using Net::FTP to get hierarchical collections of files. I just chopped it up to give you a starting point:
      #!/usr/bin/perl -w #============================================================== # GetDirHierarcy.pl #============================================================== use strict; use warnings; use Net::FTP; my $host = 'your.ftp.server'; my $uid = 'user_id'; my $pwd = 'pa$$word'; my $ftph = Net::FTP->new($host) or die "Can't connect to $host"; $ftph->login($uid,$pwd) or die "Can't login"; &GetFiles("."); $ftph->quit; sub GetFiles { my $pref = shift; # NOTE: Assumes UNIX ls format! #my @DirList = grep {/^d/} $ftph->ls; my @DirList = $ftph->dir; for my $DirEnt (@DirList) { next if $DirEnt !~ /^d/; my $DirName = (split / +/, $DirEnt, 9)[8]; my $newpref = $pref . '/' . $DirName; print "mkdir $newpref;\n"; $ftph->cwd($DirName); GetFiles($newpref); $ftph->cwd(".."); } }
      Running this against an FTP server I have access to gives:
      $ ./GetDirStruct.pl mkdir ./Files; mkdir ./Files/Bar; mkdir ./Files/Bar/Baz; mkdir ./Files/Foo; mkdir ./Inquiry; mkdir ./NpcIN; mkdir ./Processing; mkdir ./Security; mkdir ./Sysmenu;
      Just edit the print statement to actually make it create the directory hierarchy.
      NOTE: You may encounter an oddball FTP server that uses a different format for the DIR command. (I don't think that the results of the DIR command are in the FTP standard.) So you may have to edit the directory-name parsing portion of the code for your server. (I ran into this problem on an IBM mainframe FTP server.)
      --roboticus
Re: FTP Get Directory Tree Only
by aajello (Initiate) on Aug 15, 2006 at 16:14 UTC
    how about "xcopy /T /E" /E: Copies directories and subdirectories, including empty ones. /T: Creates directory structure, but does not copy files.
Re: FTP Get Directory Tree Only
by ftumsh (Scribe) on Aug 17, 2006 at 12:17 UTC
    If it's a one off... find -type d gets all the directories. Zip that up, transfer the zip file, unzip it.