in reply to Comparing files in directory

Hello smturner1,

I am trying to understand how this subroutine works and I just cannot workout the logic.

Despite the various missing pieces ($ARCHIVE, $SourceDir, @errors, ...), the general logic of the subroutine is fairly straightforward. The aim is to populate an array, @diffs, with details of the differences between the two directories $ARCHIVE and $SourceDir. First, the contents of $ARCHIVE are stored as keys in the hash %old_list, and the contents of $SourceDir are stored as keys in the hash %new_list. With this information available, it is then easy to determine which files are present in $ARCHIVE but not in $SourceDir:

for my $file (sort keys %old_list) { if ( !defined $new_list{$file} ) { push @diffs, "Old file not in new: $file"; } }

This tests each filename in %old_list and looks for it in %new_list. And this code reveals the reason for storing filenames as hash keys rather than as array elements: it is much simpler to test whether a key is present in a hash using defined than to search through an array. Incidentally, the call to sort is pointless here, and exists is usually preferable to defined in this situation.

In a similar manner, the next for loop identifies those files which are present in $SourceDir but absent from $ARCHIVE.

The specifically “Perlish” aspect to this code is the elegant use of hashes to search through lists. See, for example, How can I tell whether a certain element is contained in a list or array?

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Replies are listed 'Best First'.
Re^2: Comparing files in directory
by smturner1 (Sexton) on Dec 11, 2014 at 19:30 UTC

    Athanasius, your explanation was very helpful. I will tune-up the loop construct as you have suggested.

    As far as the custom sub goes (findfiles) there is not any files that go with it. I am the original developer of this code with the help of a Senior Developer. The section of code in question was his suggestion. He probably mentioned that I need to define findfiles, but those directions were not in my notes, nor my recollection. He is on vacation now.

    I was researching File::Find and came up with this piece of code. It is a little cryptic and I feel it is missing something. Monks, can you please take a look and help me with the details? I noted the new code with "New piece of code".

    use strict; use warnings; use diagnostics; use autodie; use File::Copy::Recursive qw( rcopy ); use File::Path qw( mkpath remove_tree ); use File::Glob qw( :globally ); use File::Find; use File::Spec; use Getopt::Long; use Pod::Usage; use POSIX qw(strftime); use File::Spec::Functions; #Set command line arguments my ($website, $old_ip, $new_ip) = @ARGV; #Set vars my $TIMESTAMP = strftime("%Y%m%d%H%M", localtime); + #my $volume = 'C:/'; + my $SourceDir = File::Spec->catfile((qw(C: Users st2641 Desktop S +ource_dir))); my $destinationDir = File::Spec->catfile(qw(C: Users st2641 Desktop), +$website); my $ARCHIVE = File::Spec->catfile(qw(C: Users st2641 Desktop), +join('_', $website, $old_ip, $TIMESTAMP)); my @WEBSITES = qw( three five calnet-test ); my $file; my @errors; my @file_list; sub DirCompare{ my ( %old_list, %new_list, @diffs); ###for my $file ($ARCHIVE) { ###$old_list{$file} = 1; ###} #////New piece of code for my $file ( find ( sub { $new_list { $file } = 1 }, $SourceDir ) +); for my $file (keys %old_list) { if ( !exists $new_list{$file} ) { push @diffs, "Old file not in new: $file"; } } for my $file (sort keys %new_list) { if ( !defined $old_list{$file} ) { push @diffs, "Old file not in new: $file"; } } if (@diffs) { my $msg = sprintf "WARNING: %s\n", join "\nWARNING: ", @errors; check_ok( $msg ); } return; } $file (find( sub {$new_list{$file} = 1}, $SourceDir ) );

      Providing there are no sub-folders, the essentials are this

      #!perl use strict; use File::Find; my $old = 'c:/....'; # as reqd my $new = 'c:/....'; # as reqd my %old_list =(); my %new_list =(); find( sub { $old_list{$_} = 1}, $old); find( sub { $new_list{$_} = 1}, $new); my $diffs = DirCompare(); print "$_\n" for @$diffs; sub DirCompare { my @diffs = (); for my $file (sort keys %old_list) { if ( !defined $new_list{$file} ) { push @diffs, "Old file not in new: $file"; } } for my $file (sort keys %new_list) { if ( !defined $old_list{$file} ) { push @diffs, "New file not in old: $file"; } } return \@diffs; }
      poj

        Thank you poj.

        Just some questions....

        Below, I declare my hashs and array (within the sub DirCompare)

          my ( %old_list, %new_list, @diffs);

        Next line, I am looping through $ARCHIVE and reading files or files within folders into $file. This will continue until it reaches the top of the file list or parent directory. $file will then be passed into the hash %old_list

          for my $file ( find ( sub { %old_list { $file } = 1 }, $ARCHIVE ) );

        Next line, I am looping through $SourceDir and reading files or files within folders into $file. This will continue until it reaches the top of the file list or parent directory. $file will then be passed into the hash %new_list

          for my $file ( find ( sub { %new_list { $file } = 1 }, $SourceDir ) );

        The above statements are what I intended when I wrote the code. So my questions are, why do I need to declare the hashs and arrays when they are already declared? Does $file in the inner most brackets need to be '$_' or $file?