in reply to Directory comparisons

If any files exist in both local directories, then I don't want the script to retrieve these files from the remote directory

This statement is not easy to understand, and it kinda contradicts to what you're saying before it and after.

Anyway, why don't you use the same List::Compare to compare local directory to a third (also local) directory? Some sample data to demonstrate what you're trying to achieve would be welcomed.

Replies are listed 'Best First'.
Re^2: Directory comparisons
by joeymac (Acolyte) on Dec 20, 2011 at 14:28 UTC

    Here is (most of) the script. At least the relevant bits! In this version I have gotten the intersection of the two lists, but what to do with it?

    #!/usr/bin/perl use warnings; use strict; use File::Copy; use File::Basename; use List::Compare; use Net::FTP; use Net::Netrc; our @globals; our @ProdRec; our @rmtDirList; our @matches; our @tempMatches; our @wrkDirList; our @onlyInRmtDir; our @filesToMove; our @filesToRetrv; our @currentDirList; our $ftp; our $readyFile; our $workingDir; our $ftpFailDir; our $fileToGet; our $remoteDir; our $finalDir; our $getFile; open CONFIG, "/home/config/Config_lcl.txt" || print ("Can't open confi +guration file"); my $config = join " ", <CONFIG>; close CONFIG; eval $config; print "Couldn't evaluate the config file: $@\n" if $@; open (LOG, ">>", $globals[0]{'logName'}) || print ("Can't open log file\n"); for (;;) { my $loopSleep = $globals[2]{'sleepTime'}; my $resultsFound = 0; $workingDir = $globals[2]{'workingDir'}; opendir( DIR1, $workingDir ) || print LOG "Cannot open working directory: $workingDir\n"; my @wrkDirList = readdir (DIR1); closedir DIR1; my $destinationDir = $globals[2]{'destinationDir'}; opendir( DIR2, $destinationDir ) || print "Can't open dist dir\n"; my @destDirList = readdir (DIR2); closedir DIR2; for (my $i=0; $i < $dirNum; $i++) { $remoteDir = $ProdRec[$i]{'remoteDir'}; my @currentDirList = &ftpDirList; push (@rmtDirList, @currentDirList); } foreach my $item (@wrkDirList) { my ( $fileName, $filePath, $fileExt ) = fileparse($item, qr/\.[^ +.]*/); $item = $fileName; } my $lc = List::Compare->new( \@rmtDirList, \@wrkDirList ); my @onlyInRmtDir = $lc->get_unique; my $lc2 = List::Compare->new( \@destDirList, \@wrkDirList ); my @inDestDir = $lc2->get_intersection; for (my $i=0; $i < $prodNum ; $i++) { $finalDir = $ProdRec[$i]{'finalDir'}; $ftpFailDir = $ProdRec[$i]{'ftpFailDir'}; if ( @onlyInRmtDir > 0 ) { my $grepString = $ProdRec[$i]{grepString}; $resultsFound = 1; my @tempMatches = grep { /$grepString/ } @onlyInRmtDir; push (@matches, @tempMatches); } if ( @inDestDir > 0 ) { foreach my $destFile (@inDestDir) { print "In destDir: $destFile\n"; chdir ($workingDir); my $pwd = `pwd`; print "PWD: $pwd\n"; #unlink $destFile; } } } ftpFileGet(); @matches = (); foreach my $getFile (@filesToMove) { print LOG "------------------------------------------------\n"; print LOG "Found new file: $getFile\n"; print LOG "Download of $getFile successful", "\n"; $readyFile = "$getFile"."$globals[1]{'ready'}"; copy("$workingDir/$getFile", "$finalDir/$readyFile"); print LOG "\n$getFile \n renamed to \n$readyFile\n and moved to $ +finalDir\n"; } @onlyInRmtDir = (); @wrkDirList = (); sleep $loopSleep; } sub ftpDirList { my $host = $ProdRec[0]{'remoteServ'}; $ftp = Net::FTP->new($host); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host", "\n"; print "at: ", scalar(localtime($TODAY)), "\n"; } else { for (;;) { print "Unable to make connection with $host", "\n"; print "Will try to reconnect in 5 seconds to $host", "\n"; sleep 5; $ftp = Net::FTP->new($host); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host", "\n"; print "at: ", scalar(localtime($TODAY)), "\n"; last; } } } $ftp->login(); $ftp->pasv; $ftp->binary; my @currentDirList = $ftp->ls(); $ftp->quit(); print "Connection to $host closed\n"; return @currentDirList; } #Subroutine to ftp connect and retrieve data that is "new". sub ftpFileGet { my $host = $ProdRec[0]{'remoteServ'}; $ftp = Net::FTP->new($host, Timeout => 60); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host here", "\n"; print "at: ", scalar(localtime($TODAY)), "\n"; } else { for (;;) { print "Unable to make connection with $host", "\n"; print "Will try to reconnect in 5 seconds to $host", "\n"; sleep 5; $ftp = Net::FTP->new($host, Timeout => 60); if (defined($ftp)) { my $TODAY = time; print "Established ftp connection with $host here\n"; print "at: ", scalar(localtime($TODAY)), "\n"; last; } } } $ftp->login(); $ftp->pasv; $ftp->binary; chdir($workingDir); foreach my $fileToGet (@matches) { my $remoteFileSize = $ftp->size($fileToGet); my $localFileName = "$fileToGet"."$globals[1]{'transfer'}"; my $ftpReturnVar = $ftp->get($fileToGet, $localFileName); next if ! defined $ftpReturnVar; my $localFileSize = (stat "$workingDir/$localFileName")[7]; if ($remoteFileSize == $localFileSize) { push (@filesToRetrv, $ftpReturnVar); } else { print "File sizes don't match\n"; print "\n$ftpReturnVar\n moved to\n $ftpFailDir\n"; my $failFileName = "$ftpReturnVar"."$globals[1]{'failed'}"; move($ftpReturnVar, "/$ftpFailDir/$failFileName"); } } foreach my $item (@filesToRetrv) { my ( $fileName, $filePath, $fileExt ) = fileparse($item, qr/\. +[^.]*/); if (rename($item, $fileName)) { push (@filesToMove, $fileName); } else { print LOG "Rename failed for $fileToGet to $fileName\n"; } } $ftp->quit(); print "Connection to $host closed\n"; return @filesToMove; }
Re^2: Directory comparisons
by joeymac (Acolyte) on Dec 20, 2011 at 14:10 UTC

    What happened was there was a connection failure with the remote server and when the script was restarted, some duplication of data occurred and it has sporadically occurred since. Oddly enough, it never happened in operation before the remote server failure. My script retrieves from the remote directory to the local directory, then the files are picked up by other scripts out of my control and placed into the third (also local) directory (among other places). From there they are distributed to customers/websites (scripts also out of my control), so duplication of data is a big no-no. I am considering trying to use the $lc->get_intersection command (List::Compare) to find files that appear in both directory lists, but I'm not sure the best way to check that files here have not already been transferred. Someone has suggested a rotating stand alone file that serves as a "memory" of which files were successfully transferred during a given time frame, but this sounds like a bit of work.

      but I'm not sure the best way to check that files here have not already been transferred

      And by transferred here you mean that file was retrieved to local directory, copied to the third directory, and then removed by some other process, is it? In this case yes, you need to keep log of files which were successfully transferred (or rather just list of files that were copied to the third directory).

        Yes, sorry for the vagueness. That is what I mean by transferred. So the log file(s) should be something like a "dump" of the array contents (directory listing). Then read back in this log file, maybe splitting contents by filename and comparing to the directory list of things to be retrieved. And if it is in the log file, don't transfer it (again). Sound right?