Re: MD5 Hash
by gmargo (Hermit) on Dec 01, 2009 at 20:25 UTC
|
Presumably you're already using Digest, since this is apparently a followup to the thread file comparison using file open in binary mode..
In the Digest documentation there is a comparison of various digest speeds, with MD4 being the fastest.
If speed is really a huge deal, you could add an additional comparison stage,
like perhaps an MD5 over only the first 64K of each file.
Then if those match, do an MD5 over the whole file.
| [reply] |
|
|
That sounds like a good idea.
first i will past the code i have to make sure it's not a code flaw.
What do you think?
sub md5sum{
my $file = shift;
my $digest = "";
eval{
open(FILE, $file) or die "Can't find file $file\n";
my $ctx = Digest::MD4->new;
$ctx->addfile(*FILE);
$digest = $ctx->hexdigest;
close(FILE);
};
if($@){
print $@;
return "";
}
return $digest;
}
| [reply] [d/l] |
|
|
use strict;
use warnings;
sub md4sum {
my $fileName = shift;
my $digest = "";
eval {
open my $file, '<', $fileName or die "Can't open $fileName: $!
+\n";
my $buffer;
read $file, $buffer, 2**16;
close ($file);
my $ctx = Digest::MD4->new;
$ctx->add ($buffer);
$digest = $ctx->hexdigest;
};
if ($@) {
print $@;
return "";
}
return $digest;
}
Update s/2\^16/2**16/. Thanks AnomalousMonk
True laziness is hard work
| [reply] [d/l] |
|
|
Thanks that worked perfectly.
Thanks for the suggestions everyone.
| [reply] |
|
|
I like the idea of checking the 1st 64k of the md5 hash.
Could i build the hash with using the 1st 64k ?
| [reply] |
Re: MD5 Hash
by moritz (Cardinal) on Dec 01, 2009 at 20:25 UTC
|
Most likely the bottleneck is not the hash computation, but the speed of the network drive. A faster alternative would be to compute the hash on the file server itself. | [reply] |
Re: MD5 Hash
by codeacrobat (Chaplain) on Dec 01, 2009 at 20:24 UTC
|
It depends on what you need the md5sum for.
A simple modification time + filename check might do it as well.
print+qq(\L@{[ref\&@]}@{['@'x7^'!#2/"!4']});
| [reply] [d/l] |
Re: MD5 Hash
by roboticus (Chancellor) on Dec 01, 2009 at 20:29 UTC
|
Karger78:
You might want to flesh out your node with some more information. I don't know what you consider a "vast amount of time", but running a typical md5 program (md5sum) on my laptop took just under 52 seconds for a 292MB file. But I can't tell if it's faster than the program you're using or not...
Since you're on a perl site, I'm assuming that you're computing the hash with a perl program, perhaps using something from CPAN. If you provide a bit more information, perhaps we can be a bit more help.
...roboticus | [reply] |
|
|
620 MB accross a network share takes over 10 min.
| [reply] |
|
|
Karger78:
Perhaps you can SSH into the system hosting the network share, and have that computer compute the md5? If you're managing a network with multiple network shares, you might get a good speed boost by making each host compute the md5 values for the files it serves.
...roboticus
| [reply] |
|
|
|
|
Re: MD5 Hash
by Khen1950fx (Canon) on Dec 02, 2009 at 00:48 UTC
|
I tested this script with Digest::MD4 and Digest::MD5. The iso was about 699MB. Both had about the same
times(approx 10s): #!/sw/bin/perl
use strict;
use warnings;
use Digest::MD5;
my $file = '/Some/user/Desktop/Fedora/FC-6-i386-disc5.iso';
my $ctx = Digest::MD5->new;
open 'FILE', '<', $file or die "Can't open $file: $!\n";
$ctx->addfile(*FILE);
my $digest = print $ctx->hexdigest, "\n";
close($file);
| [reply] [d/l] |
|
|
$ perl -e '
> use Digest::MD5;
> $ctx = Digest::MD5->new();
> open FILE, q{<}, q{xxx} or die $!;
> $ctx->addfile( *FILE );
> $digest = print $ctx->hexdigest(), qq{\n};
> print qq{-->$digest<--\n};'
cc88f0aa880a1b97b84bfd0ebc420fa7
-->1<--
$
I hope this is useful.
| [reply] [d/l] [select] |