Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid
 
PerlMonks  

Re: File::Temp: 2 interfaces get different results with Digest::MD5 and File::Compare

by Corion (Patriarch)
on Aug 29, 2021 at 17:17 UTC ( [id://11136183]=note: print w/replies, xml ) Need Help??


in reply to File::Temp: 2 interfaces get different results with Digest::MD5 and File::Compare

Without replicating your situation, did you look at whitespace differences? I see that you're not setting :raw on the file handles nor you're using binmode on them, so that would be the first place to look for me.

Replies are listed 'Best First'.
Re^2: File::Temp: 2 interfaces get different results with Digest::MD5 and File::Compare
by jkeenan1 (Deacon) on Aug 29, 2021 at 17:30 UTC
    There is no whitespace in the strings being printed to the tempfiles.

    Which filehandles are you referring to? The File::Temp filehandles or the handle inside hexdigest_one_file()?

    (Note: I did tried binmode $FH inside that subrountine's definition. It made no difference.)

    Just now I tried: open my $FH, "<:raw", $filename or croak "Unable to open $filename for reading";. That did not make any difference, either.

    Jim Keenan

      You're using say instead of print, so whitespace certainly is involved.

      You disabled the layers on reading the data back but did you disable the layers when writing the file? I think you're usually on Windows and there, Perl (and say) will usually output \r\n to files.

      Update:On further inspection, the file sizes of the two files are identical, so there is something else afoot. Sorry for this noise.

      I looked at replicating your situation using IO layers, but while I can provoke a difference using the :crlf filehandle, I don't get the digests you posted:

      #!perl use 5.14.0; use strict; use warnings; use Carp; use Data::Dumper; use Digest::MD5; use File::Compare (qw| compare |); use File::Temp qw( tempfile ); use Test::More tests => 1; my $basic = 'x' x 10**2; my @digests; my ($fh1, $t1) = tempfile(); binmode $fh1, ':raw'; for (1..100) { say $fh1 $basic } close $fh1 or croak "Unable to close $t1 after writing"; push @digests, hexdigest_one_file($t1); diag "$t1: $digests[0]"; my $t3 = File::Temp->new( UNLINK => 0); binmode $t3, ':crlf'; for (1..100) { say $t3 $basic } close $t3 or croak "Unable to close $t3 after writing"; push @digests, hexdigest_one_file($t3); diag "$t3: $digests[1]"; is $digests[0], $digests[1]; sub hexdigest_one_file { my $filename = shift; say "Filename: $filename"; #open my $FH, '<', $filename or croak "Unable to open $filename fo +r reading"; #print for <$FH>; #close $FH; my $state = Digest::MD5->new(); open my $FH, '<:raw', $filename or croak "Unable to open $filename + for reading"; $state->addfile($FH); close $FH or croak "Unable to close $filename after reading"; return $state->hexdigest; }
      1..1 Filename: /tmp/MdfRQx3DVl # /tmp/MdfRQx3DVl: e395fd01f84d7d1006a99e2a6b8fb832 Filename: /tmp/x589MI1yYB # /tmp/x589MI1yYB: 7651c6edc9ebdcfa617bcc99e1c8a6f2 not ok 1 # Failed test at tmp.pl line 29. # got: 'e395fd01f84d7d1006a99e2a6b8fb832' # expected: '7651c6edc9ebdcfa617bcc99e1c8a6f2' # Looks like you failed 1 test of 1.

      Update2 Have you asked md5sum about which sum is correct? For my code, md5sum outputs hashes identical to what Perl computes for each file.

        Corion,

        Based on your suggestion, I developed the workaround below. The trick seems to have three parts to it:

        1. binmode $FH, ':raw': binmode the tempfile(handle) before writing to it. (I suspect that on Unix, we can get away without the ':raw', but whatever.)

        2. close $FH: close the tempfile(handle) after writing to it and before calling hexdigest on it.

        3. Ignore File::Compare::compare() for now. (I don't need for my real-world problem, anyway.)

        #!perl use 5.14.0; use warnings; use Carp; use Data::Dumper; use Digest::MD5; use File::Temp qw( tempfile ); use Test::More; sub hexdigest_one_file { my $filename = shift; say "Filename: $filename"; my $state = Digest::MD5->new(); open my $FH, '<', $filename or croak "Unable to open $filename for + reading"; $state->addfile($FH); close $FH or croak "Unable to close $filename after reading"; return $state->hexdigest; } my $basic = 'x' x 10**2; my @digests; my ($fh1, $t1) = tempfile(); binmode $fh1, ':raw'; for (1..100) { say $fh1 $basic } close $fh1 or croak "Unable to close $t1 after writing"; push @digests, hexdigest_one_file($t1); my $t3 = File::Temp->new( UNLINK => 0); binmode $t3, ':raw'; for (1..100) { say $t3 $basic } close $t3 or croak "Unable to close $t3 after writing"; push @digests, hexdigest_one_file($t3); say Dumper [ @digests ]; cmp_ok($digests[0], 'eq', $digests[1], "Same md5_hex for $t1 and $t3"); done_testing();

        Why this works I do not know. I think this is, at the very least, a deficiency in the File::Temp documentation and will file a bug report on it.

        Thank you for your assistance.

        Jim Keenan

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11136183]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others taking refuge in the Monastery: (3)
As of 2024-04-25 09:39 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found