Re: Global variables question

Your code is not indented properly and that makes it hard to follow. I personally prefer the newer style of putting the left braces on the next line instead of at an end of line. Your preference may vary. But no matter which approach you use, the indentation levels should be easy to see and this does make a difference!

I think you are relatively new to this, so I will explain my thinking while writing some code for you...

This is the first time that I'd used the MD5 check sum thing. I downloaded and installed the MD5 module and then considered what is the easiest format to use? I figured that hex was. I did a little hacking to see that this produced what I thought that it would. Not to say that base64 is "bad", use what you want all that matters here that you can generate a repeatable string for a file and that is is very unlikely for 2 different files to have the same string.

The next step was to figure out the right data structure. Basically we need file name (which is unique in a directory) and the fancy check sum (MD5)for that file which will be a single string consisting of hex values. A hash table instead of an array seemed appropriate.

Then I made a subroutine that takes pathname and returns the hash of name=>checksum. There are some ways of passing back the return values more efficiently, but here it didn't seem to matter.

I'm not sure what the comparison rules are. If say dirA is considered the "master dir" and you want to know if any files changed in dirB, loop over names in dirA and see if corresponding checksum in dirB matches. Things get more complex if you want to know if any "additional" files are in B that are not in A. Rather than write code, I leave it to you do decide what you need.

#!/usr/bin/perl -w
use strict;
use Digest::MD5 qw(md5 md5_hex md5_base64);
use Data::Dumper;
$| =1;   #turns off output buffering, useful for debugging

#all of these are possible
# $digest = md5($data);
# $digest = md5_hex($data);  #make it easy , use this!
# $digest = md5_base64($data);

my %dirA = get_chksums(".");  #### put real dir name here,
                              #### not "."(current directory)
print Dumper \%dirA;

my %dirB = get_chksums(".");  #### put real dir name here
print Dumper \%dirB;

#####
### put some comparison stuff here
#####

sub get_chksums
{
   my $path = shift;
   my %file2cksum;

   opendir (INDIR, $path) || die "unable to open $path";
   my @files = grep {-f "$path/$_"} readdir INDIR;
   close INDIR;

   foreach my $file (@files)
   {
       open (IN, '<', "$path/$file") 
             || die "unable to open $path/$file"; 
       $file2cksum{$file} = md5_hex(<IN>);
      # print "$file $file2cksum{$file}\n";  #for debugging...
       close IN;
   }  
   return %file2cksum;
}
[download]

Comment on Re: Global variables question Download Code

Replies are listed 'Best First'.
Re^2: Global variables question by PerlScholar (Acolyte) on Aug 24, 2010 at 14:43 UTC
Hi Marshall, Thank you for your detailed response very well explained. As you can tell i'm quite new to this so I will work on my indenting to make my code clearer in the future. Just for my understanding... is there a difference betwen a hash and a 2D array or is a hash table more efficient? Also for the compare part I want to compare the files as you described in the last part of your answer. Would something like this be on the right lines? `my $found = 0;` `foreach my $key1 (keys %hash1) {` `foreach my $key2 (keys %hash2) {` `if ($hash1{$key1} eq $hash2{$key2})` `{` `$found=1;` `}` `print "$found";` `}` `}`	[reply] [d/l] [select]
Re^3: Global variables question by Marshall (Canon) on Aug 24, 2010 at 16:50 UTC
I figure you still have some indenting work to do. This is very important. Judiciously applied white space is one of the very most important things that you can do to improve readability of your code. Untested, but I figure close to what you want...test, experiment, move forward with the advice you've gotten so far... `my $num_errors = 0; foreach my $file (keys %hash1) { if (!exists ($hash2{$file}) ) { print "file: $file doesn't exist in 2nd directory\n"; } elsif ($hash1{$file} ne $hash2{$file}) { print "md5 didn't match for $file\n"; # meaning that file in 2nd directory is not the # same as the file in 1st directory $num_errors++; } } print "total errors = $num_errors\n";` [download] Perl "sees" something akin to this (below)... a bit harder to understand than the above? white space is important, variable names are important. I called my hashes %dirA and %dirB instead of %hash1 and %hash2 for a reason! %x is a hash but "x" has no contextual meaning! %dirA is a hash of file names in directory A to checksums. Even %dirA_files_to_checksums would be wayyyyyy better than %hash1. I guess %dir1 is also ok. The % means hash - give more contextual information! `my $num_errors=0;foreach my $file (keys %hash1){if (!exists ($hash2{$f +ile})){print "file: $file doesn't exist in 2nd directory\n";}elsif ($ +hash1{$file} ne $hash2{$file}){print "md5 didn't match for $file\n";$ +num_errors++;}}print "total errors = $num_errors\n";` [download] PS: Yes, a hash tables for this purpose is going to be WAY more efficient than an array.	[reply] [d/l] [select]
Re^4: Global variables question by PerlScholar (Acolyte) on Aug 24, 2010 at 22:36 UTC
Thanks Marshall, By the way I was not planning to use %hash1 etc in my code it was just something I quickly scribbled down to test my logic but thanks! Will work on my indenting too! You've been a great help.	[reply]
Re^5: Global variables question by Marshall (Canon) on Aug 26, 2010 at 18:43 UTC
Re^6: Global variables question by PerlScholar (Acolyte) on Aug 31, 2010 at 11:48 UTC