ftumsh has asked for the wisdom of the Perl Monks concerning the following question:

Lo, The code below gives different results if the binmode statement is removed. All the code does is print the md5sum of a file using Digest::MD5. When I remove the CR from the output file, the md5 agrees with the md5 of the filehandle in the script. So, it looks like the md5 using the filehandle is somehow getting the file without the crlf layer. So, any ideas? I have a workaround, but it has piqued my curiousity... The general logic is 1) Open file, autoflush 2) binmode crlf 3) write "Hello world\n" 4) seek 0,0. 5) Get md5 using filehandle 6) get md5 by opening file
#! /usr/bin/perl -w use strict; use diagnostics; use File::Temp(); use File::Copy; my $FH = File::Temp->new ( TEMPLATE => 'tmpXXXXXXXXXX', SUFFIX => '.xml_out', DIR => '/tmp', UNLINK => 1 ); $FH->autoflush(1); binmode( $FH, ':crlf' ); print $FH "Hello World\n"; my $tgt_file = '/tmp/JD'; copy ( $FH->filename, $tgt_file ); require Digest::MD5; my $md5; $FH->seek(0,0); $md5 = Digest::MD5->new->addfile( $FH ); print 'by FH: ',$md5->hexdigest,"\n"; my $md52; open(FILE, $tgt_file) or die "Can't open : $!"; binmode(FILE); $md52 = Digest::MD5->new->addfile(*FILE); close FILE; print 'by FILE1: ',$md52->hexdigest,"\n"; my $md53; open(FILE, $FH->filename) or die "Can't open : $!"; binmode(FILE); $md53 = Digest::MD5->new->addfile(*FILE); close FILE; print 'by FILE2: ',$md53->hexdigest,"\n";

Replies are listed 'Best First'.
Re: binmode layer and seek
by moritz (Cardinal) on Jul 08, 2008 at 12:05 UTC
    That's what IO-Layers generally do - they change the read and written data on the fly.

    The :crlf IO layer adds a cr for each lf, and strip it again when you read it.

    So a "raw" file handle without such a layer will read a different result, thus producing a different MD5 sum.

    If you want to build checksums, always read the data as binary, without any additional IO layers.

    BT seek is totally unrelated here - if you close the file, and open again with the :crlf IO layer you'll get the same results as with seek(0,0).

      Ah I see. Thanks for the explanation. What I didn't understand was why the layer was working for one thing and not another. I've had a look at the MD5.pm and it's using XSloader, which I presume means the MD5.pm is using C and therefore bypassing the layer. Having said that, if it was bypassing the layer I'd expect it to give the correct result. John
        MD5.pm doesn't bypass anything. If it would, you'd get the same result for all three outputs.

        Here is, in more detail, what happens:

        • You open a file, and apply the crlf IO layer.
        • You write a "\n" to that file. The crlf layer converts that to CRLF
        • MD5.pm reads from the very same filehandle. That means that the CRLF is converted to "\n" again upon reading. MD5.pm interprets that as binary data, and thus as LF.
        • You close the file, and open it again, this time without any layer
        • MD5.pm reads from that file, and this time the line ending comes out as CRLF, because no IO layer converts anyting. MD5.pm computes a hash, which is different than before because the source data is different
Re: binmode layer and seek
by Anonymous Monk on Jul 08, 2008 at 11:46 UTC
    Your logic is faulty (never binmode crlf)
    read this