How do I verify that a copied file's contents did not corrupt/modify?

I need something more concrete than File::Compare or diff or fc.exe, for reasons too innane to delve into. Just know it is political. Anyway, I was thinking of something like an MD5 hash, and I tried using Digest::MD5 to do it. But the hash returned is different, even though File::Compare and diff report no differences (on flat text ... i.e. 'Hello, World'. Must get a seed from stat or directory or something? Or I'm doing something wrong, like hammering a screw?

The issue stems from a problem with a few boxes I assist administering. When copying files over NFS to one host or via SMB, both originating from Unix (Samba) or NT (Exceed) to the other, occasionally the destination file ends up with the correct file size, but has white space replacing large chunks of the file. 95% of the various and mixed boxes exhibit no issues.

The customer, and I for sanity, wants something akin to a CRC or checksum involved in the copy process. See my sample below.

THX
Dex\

#!/usr/bin/perl -w # -*-Perl-*- use strict; use FileHandle; use Digest::MD5; my $sourceFile = $ARGV[0]; my $destFile = $ARGV[1]; my $inFile = new FileHandle; my $outFile = new FileHandle; my $inMD5 = Digest::MD5->new; my $outMD5 = Digest::MD5->new; my ( $fileLength, $fileBuffer, $fileOffset ); $inFile->open ( "<$sourceFile" ) or die "Could not open $sourceFile:$!\n"; $inMD5->addfile ( $inFile ); $outFile->open ( ">$destFile" ) or die "Could not open $destFile:$!\n"; print $inMD5->md5_base64 , "\n"; # borrowed from "Programming Perl" my $blockSize = ( stat $inFile )[11] || 16384; while ( $fileLength = sysread $inFile, $fileBuffer, $blockSize ) { if ( !defined $fileLength ) { next if $! =~ /^Interrupted/; die "System read error: $!\n"; } my $fileOffset = 0; while ( $fileLength ) { my $written = syswrite $outFile, $fileBuffer, $fileLength, $fileOffse +t; die "System write error: $!\n" unless defined $written; $fileLength -= $written; $fileOffset += $written; }; } $outMD5->addfile ( $outFile ); print $outMD5->md5_base64 , "\n"; $inFile->close; $outFile->close;

2001-03-14 Edit by Corion: Moved the explanation of the problem up from a reply into the root node.


In reply to How do I verify that a copied file's contents did not corrupt/modify? by idnopheq

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.