Random_Walk has asked for the wisdom of the Perl Monks concerning the following question:

I am calculating MD5 sums for files on a unix distribution server and windows clients to check thay are the same. This works fine for gzip files, tar files, plain text but for .exe files the checksums differ.

# on 'doze.. # This is perl, v5.8.6 built for MSWin32-x86-multi-thread perl -le "use Digest::MD5 ('md5_base64'); print md5_base64(<>);" perl. +exe hoHYr/QOR7z7Pre69QdMOg D:\perl\bin>ftp tivprehub1 Connected to tivprehub1. <snip> 230 User ******* logged in. ftp> bin 200 Type set to I. ftp> put perl.exe 200 PORT command successful. 150 Opening data connection for perl.exe. 226 Transfer complete. 41033 bytes sent in 0.33 seconds (124.72 Kbytes/sec) ftp> bye 221 Goodbye. # *nix # This is perl, v5.8.0 built for aix-thread-multi nph>perl -le'use Digest::MD5 ("md5_base64"); print md5_base64(<>);' pe +rl.exe +QpV/tbZNRzK5aIPqc18tQ # now I gzip the exe at both ends... # 'doze perl -le "use Digest::MD5 ('md5_base64'); print md5_base64(<>);" perl. +exe.gz QinlhnAssDofoVV07khg1g # *nix nph>perl -le'use Digest::MD5 ("md5_base64"); print md5_base64(<>);' pe +rl.exe.gz QinlhnAssDofoVV07khg1g
Does anyone have a clue why this may be ? I have the workaround of gzipping the executables I am to compare then comparing these checksums but would rather not. I am imagining that little paperclip popping up 'looks like you are trying to open an executable ... '

Cheers,
R.

Pereant, qui ante nos nostra dixerunt!

Replies are listed 'Best First'.
Re: md5 sum different on windows and unix for win.exe files !!?
by demerphq (Chancellor) on Apr 06, 2005 at 15:03 UTC

    While you are binmoding the ftp transfer it doesnt look like you are binmoding the file you read in. On Win2 this means that first ^Z encountered in the file ends the file, it also means that perl performs CRLF type conversions before the data is seen by your code. This of course means the MD5 code on both platforms are actually looking at different data. (Printing out the length of the data read would have shown this immediately.) The answer of course is that if you are interested in byte level contents of a file (which you are) then you MUST binmode the file first irrespective of the OS.

    Note that the common folk-story about not needing to binmode files on unix is not correct in the modern days of utf8 and wide characters. If you need the raw contents of a file you should binmode it regardless of operating system as the encoding could change etc.

    With binmode

    D:\>perl -le "use Digest::MD5 ('md5_base64'); local $/; open my $fh,sh +ift or die $!; binmode $fh if shift; $file=<$fh>; print qq(Bytes: ),l +ength($file),qq( ),md5_base64($file); " E:\Perl\bin\perl.exe 1 Bytes: 20540 vzpQPjhNDRjGpJccEa4iMw

    Without binmode

    D:\>perl -le "use Digest::MD5 ('md5_base64'); local $/; open my $fh,sh +ift or die $!; binmode $fh if shift; $file=<$fh>; print qq(Bytes: ),l +ength($file),qq( ),md5_base64($file); " E:\Perl\bin\perl.exe Bytes: 8295 ZFaA5qHXAClj0d1czg6hQA

    PS: i surmise that the gzip files have some logic built in that makes them avoid special characters like ^Z and \0 and CRLF to avoid this type of issue in the first place. I have no idea if this is correct.

    ---
    demerphq

      dmerphq++

      Thank you so much, that was just what was needed ...

      # 'doze perl -e "use Digest::MD5('md5_base64');local $/;open $f,'perl.exe'; bi +nmode $f; $dat=<$f>;print md5_base64($dat)" +QpV/tbZNRzK5aIPqc18tQ # *nix perl -le'use Digest::MD5 ("md5_base64"); local $/;open $f, "perl.exe"; + binmode $f; $dat=<$f>;print md5_base64($dat);' perl.exe +QpV/tbZNRzK5aIPqc18tQ # and the reason I was being lazy and using md5_base64(<FH>) perl -le'use Digest::MD5 ("md5_base64"); @array=qw(a b c); print md5_b +ase64(@array);' kAFQmDzST7DWlj99KOF/cg perl -le'use Digest::MD5 ("md5_base64"); @array=qw(a b c); $array=join + "",@array; print md5_base64($array);' kAFQmDzST7DWlj99KOF/cg # thought that worked, but then did this... perl -le'use Digest::MD5 ("md5_base64"); local $/;open $f, "perl.exe"; + $data=<$f>;print md5_base64($data);' perl.exe +QpV/tbZNRzK5aIPqc18tQ perl -le'use Digest::MD5 ("md5_base64"); local $/;open $f, "perl.exe"; + binmode $f; print md5_base64($f);' perl.exe MzTg3hOPoYjcGlfa8CHjeg

      Cheers,
      R.

      Pereant, qui ante nos nostra dixerunt!

        You can do that a little more easily:

        perl -MDigest::MD5=md5_base64 -0777 -e"binmode STDIN; print md5_base64 + <STDIN>" <\bin\perl.exe

        Note: The '<' before the filename which may get obscured by the wrap.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        Lingua non convalesco, consenesco et abolesco.
        Rule 1 has a caveat! -- Who broke the cabal?
Re: md5 sum different on windows and unix for win.exe files !!?
by polettix (Vicar) on Apr 06, 2005 at 14:53 UTC
    It's weird that this works for gzip files, indeed. But... I suspect that the problem is with the -l command line option: I'd probably get rid of this and, as an added bonus, set binmode(STDIN) before the print. Chances are that it's a problem related to the different approach of Win and *nix to newlining.

    Moreover, I see that md5_base64 expects a scalar as input, so I suspect that you actually wanted to activate slurp mode in order to read the whole file at one time. In this case, you should set $/ to undef before reading.

    Flavio (perl -e "print(scalar(reverse('ti.xittelop@oivalf')))")

    Don't fool yourself.