At the risk of sounding like a broken record, I would recommend an exploration of existing mirroring solutions - For example:
w3mir - This script can be used to perform recursive mirroring of HTTP and FTP resources, including access authorisation, proxy connections and regular expression matching and exclusion.
wget - Wget is a freely available network utility which can be used to retrieve and mirror files from the HTTP and FTP resources.
rsync - rsync is an open source utility that provides fast incremental file transfer - There is a Perl interface (File::Rsync) for rsync available from CPAN.