MD5 Hash

Karger78 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: MD5 Hash by gmargo (Hermit) on Dec 01, 2009 at 20:25 UTC
Presumably you're already using Digest, since this is apparently a followup to the thread file comparison using file open in binary mode.. In the Digest documentation there is a comparison of various digest speeds, with MD4 being the fastest. If speed is really a huge deal, you could add an additional comparison stage, like perhaps an MD5 over only the first 64K of each file. Then if those match, do an MD5 over the whole file.	[reply]
Re^2: MD5 Hash by Karger78 (Beadle) on Dec 01, 2009 at 20:36 UTC
That sounds like a good idea. first i will past the code i have to make sure it's not a code flaw. What do you think? `sub md5sum{ my $file = shift; my $digest = ""; eval{ open(FILE, $file) or die "Can't find file $file\n"; my $ctx = Digest::MD4->new; $ctx->addfile(*FILE); $digest = $ctx->hexdigest; close(FILE); }; if($@){ print $@; return ""; } return $digest; }` [download]	[reply] [d/l]
Re^3: MD5 Hash by GrandFather (Saint) on Dec 01, 2009 at 22:20 UTC
`use strict; use warnings; sub md4sum { my $fileName = shift; my $digest = ""; eval { open my $file, '<', $fileName or die "Can't open $fileName: $! +\n"; my $buffer; read $file, $buffer, 216; close ($file); my $ctx = Digest::MD4->new; $ctx->add ($buffer); $digest = $ctx->hexdigest; }; if ($@) { print $@; return ""; } return $digest; }` [download] Update s/2\^16/216/. Thanks AnomalousMonk True laziness is hard work	[reply] [d/l]
Re^3: MD5 Hash by Karger78 (Beadle) on Dec 02, 2009 at 14:44 UTC
Thanks that worked perfectly. Thanks for the suggestions everyone.	[reply]
Re^2: MD5 Hash by Karger78 (Beadle) on Dec 01, 2009 at 21:12 UTC
I like the idea of checking the 1st 64k of the md5 hash. Could i build the hash with using the 1st 64k ?	[reply]
Re: MD5 Hash by moritz (Cardinal) on Dec 01, 2009 at 20:25 UTC
Most likely the bottleneck is not the hash computation, but the speed of the network drive. A faster alternative would be to compute the hash on the file server itself.	[reply]
Re: MD5 Hash by codeacrobat (Chaplain) on Dec 01, 2009 at 20:24 UTC
It depends on what you need the md5sum for. A simple modification time + filename check might do it as well. `print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});`	[reply] [d/l]
Re: MD5 Hash by roboticus (Chancellor) on Dec 01, 2009 at 20:29 UTC
Karger78: You might want to flesh out your node with some more information. I don't know what you consider a "vast amount of time", but running a typical md5 program (md5sum) on my laptop took just under 52 seconds for a 292MB file. But I can't tell if it's faster than the program you're using or not... Since you're on a perl site, I'm assuming that you're computing the hash with a perl program, perhaps using something from CPAN. If you provide a bit more information, perhaps we can be a bit more help. ...roboticus	[reply]
Re^2: MD5 Hash by Karger78 (Beadle) on Dec 01, 2009 at 20:42 UTC
620 MB accross a network share takes over 10 min.	[reply]
Re^3: MD5 Hash by roboticus (Chancellor) on Dec 01, 2009 at 20:53 UTC
Karger78: Perhaps you can SSH into the system hosting the network share, and have that computer compute the md5? If you're managing a network with multiple network shares, you might get a good speed boost by making each host compute the md5 values for the files it serves. ...roboticus	[reply]
Re^4: MD5 Hash by Karger78 (Beadle) on Dec 01, 2009 at 20:57 UTC
Re^5: MD5 Hash by roboticus (Chancellor) on Dec 02, 2009 at 14:42 UTC
Re: MD5 Hash by Khen1950fx (Canon) on Dec 02, 2009 at 00:48 UTC
I tested this script with Digest::MD4 and Digest::MD5. The iso was about 699MB. Both had about the same times(approx 10s): `#!/sw/bin/perl use strict; use warnings; use Digest::MD5; my $file = '/Some/user/Desktop/Fedora/FC-6-i386-disc5.iso'; my $ctx = Digest::MD5->new; open 'FILE', '<', $file or die "Can't open $file: $!\n"; $ctx->addfile(*FILE); my $digest = print $ctx->hexdigest, "\n"; close($file);` [download]	[reply] [d/l]
Re^2: MD5 Hash by johngg (Canon) on Dec 02, 2009 at 17:21 UTC
`my $digest = print $ctx->hexdigest, "\n";` That code will result in `$digest` having a value of `1`. That's probably not what you meant. `$ perl -e ' > use Digest::MD5; > $ctx = Digest::MD5->new(); > open FILE, q{<}, q{xxx} or die $!; > $ctx->addfile( *FILE ); > $digest = print $ctx->hexdigest(), qq{\n}; > print qq{-->$digest<--\n};' cc88f0aa880a1b97b84bfd0ebc420fa7 -->1<-- $` [download] I hope this is useful. Cheers, JohnGG	[reply] [d/l] [select]