deedo has asked for the wisdom of the Perl Monks concerning the following question:

Hello

This is my first time posting despite having been around for awhile :) usually I get there with a problem I face in learning Perl, however, unfortunately I have come across a bit of a problem with checking and verifying MD5 of files in a directory and comparing with an MD5 file type of the same name in the same directory. Essentially the directory structure looks like this:

Directory - file, file.md5, file2, file2.md5, file3, file3.md5... etc

The code I've put together so far is limited but it has the ability to calculate the MD5 and write to a log if there is an error. I just don't know how to get the original MD5 and match it to the MD5 in the corresponding file. The code I've written so far is:

use strict; use warnings; use Digest::MD5; my $sourcedirectory = '//path/to/directory'; &main(); sub main { # creating file array my @files = &getfiles($sourcedirectory); # removal of the two directory structures which form the start of +the file array my $rmdir = shift (@files); my $rmdir2 = shift (@files); # capture of a count of the number of files remaining in the array my $filecount = @files; # check to see if there are no files, if none, exit script if ($filecount == 0) { print "\nNo files to process\n\n"; exit; } # where there are files, the routine subroutine is called for each + item individually else { foreach my $item (@files) { my $filepath = "$sourcedirectory/$item"; &routine($filepath, $item); } } } sub routine { # identify filepath and name as parameters passed to routine my $file = shift; my $name = shift; } # subroutine to process the MD5 Hex value of a filehandle passed sub processmd5 { my $io_handle = shift; my $md5 = Digest::MD5->new; $md5->addfile($io_handle); my $value = $md5->hexdigest; return $value; } # logreport subroutine that receives a string message as an argument, +and processes the log entry sub logmd5 { # message, and definition of date/time variables to enable logging my $md5 = shift; my $md5filename = shift; my $md5file = "$sourcedirectory/Log.txt"; # opening of log and appending of md5 to the log open(LOG, '>>', $md5file) or die "Can't open logfile $!"; print LOG "ERROR: $md5filename does not match - $md5 is not the or +iginal checksum"; close LOG; } # a subroutine to enable definition of files to be processed sub getfiles { my $sourcefiles = shift; my @file_list; opendir(DIR, $sourcefiles) or die "Can't open directory, $!\n"; @file_list = readdir(DIR); closedir(DIR); return @file_list; }

I understand there is a big chunk missing in the 'routine' subroutine, but if any of you could give me some guidance on how to proceed with processing the file array returned by the getfiles subroutine, it would be greatly appreciated. If I am going the total wrong way with this, please point that out. Many thanks in advance!

  • Comment on Checking MD5 of files in directory with corresponding MD5 file type
  • Download Code

Replies are listed 'Best First'.
Re: Checking MD5 of files in directory with corresponding MD5 file type
by Corion (Patriarch) on Dec 30, 2016 at 09:53 UTC

    You even have the logic to process the items from the getfiles subroutine already in place.

    Maybe start with the following implementation for routine to see what happens:

    sub routine { # identify filepath and name as parameters passed to routine my $file = shift; my $name = shift; print "routine called with '$file', '$name'\n"; }

    The next thing would be to give it a proper name. Find out what the routine should do, and then rename it to that.

    Good names could be process_file or verify_md5 maybe. Or maybe the routine should do something different, but then its name should be different as well.

      Thanks Corion... I think where I am struggling the most is how to process two files in tandem from the array... i.e. file and file.md5

      What I am trying to achieve is to generate the MD5 of 'file' which already has been generated from an instrument prior to transfer and stored in a corresponding file called file.md5

      Just how I get to work with these two files separately but concurrently from the rest of the array is what is failing me at the moment

        Why do you want to process both, "file" and "file.md5"?

        I think you want to do something different with "file" than you want to do with "file.md5".

        Also note that if you know "file", you also know "file.md5", and if "file.md5" is not found that is an error.

        You can blindly open "file.md5" without needing to verify its existence beforehand.

Re: Checking MD5 of files in directory with corresponding MD5 file type
by james28909 (Deacon) on Dec 31, 2016 at 13:52 UTC
    Im not sure how your md5 file is setup, in mine I just have a known md5 (one md5 per line, and no filenames because a known md5 should be sufficient enough to check against) and I loop through each file in a directory and get each files md5, then compare that against the md5's stored in the md5 file like so:
    use warnings; use strict; use File::Slurp; my @md5_file = read_file("md5"); my $md5_string = join( "", @md5_file ); my $dirname = "path/to/files/to/be/checked"; while(my $file = (<'$dirname'/*>)) { open( my $fh, '<', $file ); binmode($fh); my $md5 = Digest::MD5->new->addfile($fh)->hexdigest; if ( $md5_string =~ $md5) { print "$md5 matches!\n"; } else { print "$md5 doesnt match ;(\n"; } }

    Even though it looks like this code works as is, I am not sure if does and I dont have the time to test it myself right now, but hopefully it will help you a little bit on your way :)

    Happy New Year!

      To save from creating a completely new thread, I am going to ask a question here, under the code I posted, and hope someone knowledgable reads it.

      In the comparison operation 'if ( $md5_string =~ $md5 ){', would it be better to write it as 'if ( $md5 =~ $md5_string ){'

        Think about it, what do the strings contain? Will "md2" =~ "md1\nmd2\nmd3\n" match?
Re: Checking MD5 of files in directory with corresponding MD5 file type
by deedo (Novice) on Jan 06, 2017 at 11:48 UTC
    <>In case anyone wants to know the "final" code I used - here it is ... thanks to Corion for a lot of guidance along to the way! A little tidy up still to be done...

    use strict; use warnings; use Digest::MD5; my $sourcedirectory = '//path/to/directory'; &main(); sub main { # creating file array my @files = &getfiles($sourcedirectory); # removal of the two directory structures which form the start of +the file array my $rmdir = shift (@files); my $rmdir2 = shift (@files); # capture of a count of the number of files remaining in the array my $filecount = @files; # check to see if there are no files, if none, exit script if ($filecount == 0) { print "\nNo files to process\n\n"; exit; } # where there are files, the routine subroutine is called for each + item individually (except for md5 file types) else { foreach my $item (@files) { next if $item =~ /.md5/; my $filepath = "$sourcedirectory/$item"; &routine($filepath, $item); } } } sub routine { # identify filepath and name as parameters passed to routine my $file = shift; my $name = shift; # open file, set to binary mode, close and process MD5 open my $fh, '<', $file or die "Can't open file: $!"; binmode ($fh); my $md5tocompare = &processmd5($fh); close $fh or die "Can't close file successfully: $!"; # check file.md5 existence and open and read to memory open my $fh2, '<', "$file.md5" or next; my $originalmd5 = <$fh2>; close $fh2 or die "Can't close file successfully: $!"; # check parity print "MD5 is $md5tocompare against orignal MD5 of $originalmd5\n" +; next if $md5tocompare eq $originalmd5; &logmd5($md5tocompare, $name); } # subroutine to process the MD5 Hex value of a filehandle passed sub processmd5 { my $io_handle = shift; my $md5 = Digest::MD5->new; $md5->addfile($io_handle); my $value = $md5->hexdigest; return $value; } # logreport subroutine that receives a string message as an argument, +and processes the log entry sub logmd5 { # message, and definition of date/time variables to enable logging my $md5 = shift; my $md5filename = shift; my $md5file = "$sourcedirectory/Log.txt"; # opening of log and appending of md5 to the log open(LOG, '>>', $md5file) or die "Can't open logfile $!"; print LOG "ERROR: $md5filename does not match - $md5 is not the or +iginal checksum"; close LOG; } # a subroutine to enable definition of files to be processed sub getfiles { my $sourcefiles = shift; my @file_list; opendir(DIR, $sourcefiles) or die "Can't open directory, $!\n"; @file_list = readdir(DIR); closedir(DIR); return @file_list; }

    I also have a script to generate the MD5 files if anyone needs it... just message me. Thanks!