joeymac has asked for the wisdom of the Perl Monks concerning the following question:

Good morning wise monks!

I am working on improving a script that I have written which does some file transfers based on directory listing comparisons. I have basically read a remote directory and local directory listing (a la $ftp->ls() and readdir) into two separate arrays and used the List::Compare module to compare the contents of them ($lc->get_unique). If there are "new" files in the remote directory, they are retrieved to the local directory. What I would like to do now, is compare the local directory to a third (also local) directory. If any files exist in both local directories, then I don't want the script to retrieve these files from the remote directory (i.e. the files are retrieved to local directory #1 then copied to local directory #2, but I don't ever want any duplicates to end up in local directory #2). Any suggestions about what may be the best way to accomplish this task would be greatly appreciated! Thanks!

Replies are listed 'Best First'.
Re: Directory comparisons
by RichardK (Parson) on Dec 20, 2011 at 12:53 UTC

    I'd just use rsync and not have to write any code at all :)

Re: Directory comparisons
by zwon (Abbot) on Dec 20, 2011 at 13:29 UTC
    If any files exist in both local directories, then I don't want the script to retrieve these files from the remote directory

    This statement is not easy to understand, and it kinda contradicts to what you're saying before it and after.

    Anyway, why don't you use the same List::Compare to compare local directory to a third (also local) directory? Some sample data to demonstrate what you're trying to achieve would be welcomed.

      Here is (most of) the script. At least the relevant bits! In this version I have gotten the intersection of the two lists, but what to do with it?

      #!/usr/bin/perl use warnings; use strict; use File::Copy; use File::Basename; use List::Compare; use Net::FTP; use Net::Netrc; our @globals; our @ProdRec; our @rmtDirList; our @matches; our @tempMatches; our @wrkDirList; our @onlyInRmtDir; our @filesToMove; our @filesToRetrv; our @currentDirList; our $ftp; our $readyFile; our $workingDir; our $ftpFailDir; our $fileToGet; our $remoteDir; our $finalDir; our $getFile; open CONFIG, "/home/config/Config_lcl.txt" || print ("Can't open confi +guration file"); my $config = join " ", <CONFIG>; close CONFIG; eval $config; print "Couldn't evaluate the config file: $@\n" if $@; open (LOG, ">>", $globals[0]{'logName'}) || print ("Can't open log file\n"); for (;;) { my $loopSleep = $globals[2]{'sleepTime'}; my $resultsFound = 0; $workingDir = $globals[2]{'workingDir'}; opendir( DIR1, $workingDir ) || print LOG "Cannot open working directory: $workingDir\n"; my @wrkDirList = readdir (DIR1); closedir DIR1; my $destinationDir = $globals[2]{'destinationDir'}; opendir( DIR2, $destinationDir ) || print "Can't open dist dir\n"; my @destDirList = readdir (DIR2); closedir DIR2; for (my $i=0; $i < $dirNum; $i++) { $remoteDir = $ProdRec[$i]{'remoteDir'}; my @currentDirList = &ftpDirList; push (@rmtDirList, @currentDirList); } foreach my $item (@wrkDirList) { my ( $fileName, $filePath, $fileExt ) = fileparse($item, qr/\.[^ +.]*/); $item = $fileName; } my $lc = List::Compare->new( \@rmtDirList, \@wrkDirList ); my @onlyInRmtDir = $lc->get_unique; my $lc2 = List::Compare->new( \@destDirList, \@wrkDirList ); my @inDestDir = $lc2->get_intersection; for (my $i=0; $i < $prodNum ; $i++) { $finalDir = $ProdRec[$i]{'finalDir'}; $ftpFailDir = $ProdRec[$i]{'ftpFailDir'}; if ( @onlyInRmtDir > 0 ) { my $grepString = $ProdRec[$i]{grepString}; $resultsFound = 1; my @tempMatches = grep { /$grepString/ } @onlyInRmtDir; push (@matches, @tempMatches); } if ( @inDestDir > 0 ) { foreach my $destFile (@inDestDir) { print "In destDir: $destFile\n"; chdir ($workingDir); my $pwd = `pwd`; print "PWD: $pwd\n"; #unlink $destFile; } } } ftpFileGet(); @matches = (); foreach my $getFile (@filesToMove) { print LOG "------------------------------------------------\n"; print LOG "Found new file: $getFile\n"; print LOG "Download of $getFile successful", "\n"; $readyFile = "$getFile"."$globals[1]{'ready'}"; copy("$workingDir/$getFile", "$finalDir/$readyFile"); print LOG "\n$getFile \n renamed to \n$readyFile\n and moved to $ +finalDir\n"; } @onlyInRmtDir = (); @wrkDirList = (); sleep $loopSleep; } sub ftpDirList { my $host = $ProdRec[0]{'remoteServ'}; $ftp = Net::FTP->new($host); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host", "\n"; print "at: ", scalar(localtime($TODAY)), "\n"; } else { for (;;) { print "Unable to make connection with $host", "\n"; print "Will try to reconnect in 5 seconds to $host", "\n"; sleep 5; $ftp = Net::FTP->new($host); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host", "\n"; print "at: ", scalar(localtime($TODAY)), "\n"; last; } } } $ftp->login(); $ftp->pasv; $ftp->binary; my @currentDirList = $ftp->ls(); $ftp->quit(); print "Connection to $host closed\n"; return @currentDirList; } #Subroutine to ftp connect and retrieve data that is "new". sub ftpFileGet { my $host = $ProdRec[0]{'remoteServ'}; $ftp = Net::FTP->new($host, Timeout => 60); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host here", "\n"; print "at: ", scalar(localtime($TODAY)), "\n"; } else { for (;;) { print "Unable to make connection with $host", "\n"; print "Will try to reconnect in 5 seconds to $host", "\n"; sleep 5; $ftp = Net::FTP->new($host, Timeout => 60); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host here\n"; print "at: ", scalar(localtime($TODAY)), "\n"; last; } } } $ftp->login(); $ftp->pasv; $ftp->binary; chdir($workingDir); foreach my $fileToGet (@matches) { my $remoteFileSize = $ftp->size($fileToGet); my $localFileName = "$fileToGet"."$globals[1]{'transfer'}"; my $ftpReturnVar = $ftp->get($fileToGet, $localFileName); next if ! defined $ftpReturnVar; my $localFileSize = (stat "$workingDir/$localFileName")[7]; if ($remoteFileSize == $localFileSize) { push (@filesToRetrv, $ftpReturnVar); } else { print "File sizes don't match\n"; print "\n$ftpReturnVar\n moved to\n $ftpFailDir\n"; my $failFileName = "$ftpReturnVar"."$globals[1]{'failed'}"; move($ftpReturnVar, "/$ftpFailDir/$failFileName"); } } foreach my $item (@filesToRetrv) { my ( $fileName, $filePath, $fileExt ) = fileparse($item, qr/\. +[^.]*/); if (rename($item, $fileName)) { push (@filesToMove, $fileName); } else { print LOG "Rename failed for $fileToGet to $fileName\n"; } } $ftp->quit(); print "Connection to $host closed\n"; return @filesToMove; }

      What happened was there was a connection failure with the remote server and when the script was restarted, some duplication of data occurred and it has sporadically occurred since. Oddly enough, it never happened in operation before the remote server failure. My script retrieves from the remote directory to the local directory, then the files are picked up by other scripts out of my control and placed into the third (also local) directory (among other places). From there they are distributed to customers/websites (scripts also out of my control), so duplication of data is a big no-no. I am considering trying to use the $lc->get_intersection command (List::Compare) to find files that appear in both directory lists, but I'm not sure the best way to check that files here have not already been transferred. Someone has suggested a rotating stand alone file that serves as a "memory" of which files were successfully transferred during a given time frame, but this sounds like a bit of work.

        but I'm not sure the best way to check that files here have not already been transferred

        And by transferred here you mean that file was retrieved to local directory, copied to the third directory, and then removed by some other process, is it? In this case yes, you need to keep log of files which were successfully transferred (or rather just list of files that were copied to the third directory).