Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello monks and Happy Easter to everyone!
I want to use Perl to download some files from an ftp site, namely ftp://ftp.ncbi.nih.gov/genomes/Bacteria In particular, I want to be able to download all directories in this site (if you visit the link, you will see that there are numerous directories under the one I have written above), and, from each directory, I need only the files that have the extension .faa in them.
For example, I need to have the directory ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Acidovorax_JS42
with 3 files in it, that is NC_008765.faa , NC_008766.faa and NC_008782.faa, because there are 3 files with the extension .faa in that directory.
I know nothing about Perl and the www, so I thought I ask here. What I only know to do is download the entire ftp site in my hd (using wget -b tp://ftp.ncbi.nih.gov/genomes/Bacteria
and then start erasing the unecessary files (using find . -name "*.xxx" |xargs rm where xxx is every file extension apart from .faa that I want to keep...
As you can understand, this is very time (and hard-disk) consuming, so I thought I ask for advice here.
Thank you in advance!
Comment on Perl and ftp sites - recursive downloads
You may want to start looking at the Net::FTP::Recursive module, it is well documented and has an example script you could use as the basis for your script. Read about the MatchFiles option of the rget method regards filtering for the file extension you want.
If you need help installing Perl modules check out Installing Modules from the tutorials section of this site.