in reply to file comparison

Ok now if I go with the md5 way how could I accomplish this? here is the code I have started.
opendir(DIR, $RemoteSubDirectory); my @rFileCheck = readdir(DIR); closedir(DIR); opendir(DIR, $localCpPath); my @lFileCheck = readdir(DIR); closedir(DIR); my $c1; foreach (@lFileCheck) { print md5_base64($lFileCheck[$c1]); print "\n"; $c1++; } my $c2; foreach (@rFileCheck) { print md5_base64($rFileCheck[$c1]); print "\n"; $c2++; }

Replies are listed 'Best First'.
Re^2: file comparison
by graff (Chancellor) on Jun 27, 2009 at 00:46 UTC
    You have a ways to go yet. Your foreach loops won't do what you want, for a few different reasons:
    • The parameter you pass to md5_base64 needs to be the data in the file, not the name of the file.
    • When you read a file to get its md5, you need to include the path name with the file name (because readdir only returns the file name, not the path).
    • Rather than just printing the md5s, you should store them, compare them, and print (and/or act on) the results of the comparisons.

    Apart from that, your loop usage could be a little better. Also, I think it ends up being easier to use the "object" style interface to Digest::MD5. You still have to open each file, but then you can just pass the file handle to the module.

    Here's an approach that includes checking file size in combination with the md5 checksum, and reports 3 different problem cases that might come up:

    #!/usr/bin/perl use strict; use warnings; use Digest::MD5; die "Usage: $0 remoteDir localDir\n" unless ( @ARGV == 2 and -d $ARGV[0] and -d $ARGV[1] ); my ( $remote, $local ) = @ARGV; my %md5; my $digest = Digest::MD5->new(); for my $dir ( $local, $remote ) { opendir DIR, $dir or die "$dir: $!\n"; while ( my $f = readdir( DIR )) { next unless -f "$dir/$f"; if ( open( my $fh, "<", "$dir/$f" )) { $digest->new; $digest->addfile( $fh ); $md5{$f}{$dir} = join( " ", -s _, $digest->b64digest ); } else { warn "Open failed for $dir/$f: $!\n"; } } } for my $file ( sort keys %md5 ) { if ( $md5{$file}{$remote} and ! $md5{$file}{$local} ) { warn sprintf( "%s: found on remote, not found in local path\n" +, $file ); } elsif ( $md5{$file}{$remote} ne $md5{$file}{$local} ) { warn sprintf( "%s: remote/local difference: %s vs. %s\n", $fil +e, $md5{$file}{$remote}, $md5{$file}{$local} ); } else { unlink "$remote/$file" or warn sprintf( "%s: unable to delete remote copy: %s\n", $f +ile, $! ); } }
    (updated to remove an unnecessary "next" from the latter for loop. Also added error checking when reading the files for their md5s.)
Re^2: file comparison
by Marshall (Canon) on Jun 27, 2009 at 14:55 UTC
    one thing to consider in the below code is that a "directory" is a file. This means the "." and ".." are files too! I congratulate you on using readdir rather than "globbing". This is much more portable and is the right way to go. I would put a grep to filter to the "real files",
    opendir(DIR, $RemoteSubDirectory); my @rFileCheck = grep {-f $RemoteSubDirectory/$_ }readdir(DIR);
    Remember that readdir only gives file names and you have to add the path... Your code:
    opendir(DIR, $RemoteSubDirectory); my @rFileCheck = readdir(DIR); closedir(DIR);