Perl and ftp sites - recursive downloads

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks and Happy Easter to everyone!
I want to use Perl to download some files from an ftp site, namely ftp://ftp.ncbi.nih.gov/genomes/Bacteria
In particular, I want to be able to download all directories in this site (if you visit the link, you will see that there are numerous directories under the one I have written above), and, from each directory, I need only the files that have the extension .faa in them.
For example, I need to have the directory ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Acidovorax_JS42 with 3 files in it, that is NC_008765.faa , NC_008766.faa and NC_008782.faa, because there are 3 files with the extension .faa in that directory.
I know nothing about Perl and the www, so I thought I ask here. What I only know to do is download the entire ftp site in my hd (using wget -b tp://ftp.ncbi.nih.gov/genomes/Bacteria and then start erasing the unecessary files (using find . -name "*.xxx" |xargs rm where xxx is every file extension apart from .faa that I want to keep...
As you can understand, this is very time (and hard-disk) consuming, so I thought I ask for advice here.
Thank you in advance!

Comment on Perl and ftp sites - recursive downloads Select or Download Code

Replies are listed 'Best First'.
Re: Perl and ftp sites - recursive downloads by marto (Cardinal) on Apr 04, 2007 at 10:37 UTC
Hi, You may want to start looking at the Net::FTP::Recursive module, it is well documented and has an example script you could use as the basis for your script. Read about the `MatchFiles` option of the `rget` method regards filtering for the file extension you want. If you need help installing Perl modules check out Installing Modules from the tutorials section of this site. Hope this helps Martin	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Perl and ftp sites - recursive downloads
by marto (Cardinal) on Apr 04, 2007 at 10:37 UTC

Net::FTP::Recursive

example script

MatchFiles

rget

Installing Modules

tutorials

[reply]
[d/l]
[select]