use Digest::file qw(digest_file_hex);
my $file = "/some/path/to/file";
my $md5sum = digest_file_hex($file, "MD5");
| [reply] [d/l] |
| [reply] |
See md5 sum different on windows and unix for win.exe files !!? for how to use perl's md5sum, to avoid forking altogether. If I was doing a program that needed md5sums on thousands of files, I probably would try to setup a permanent worker thread, that would take a file and return it's sum, without forking; like try to setup the c binary in the thread with IPC, and print files to it's STDIN with the '-' option, then collect output thru IPC. That way you are only starting one fork in the child thread....possibly have multiple summing threads to speed it up.
If you wanted to loop thru all files for duplicates, you could use File::Find to loop thru them, and store the md5sum in a hash.
#!/usr/bin/perl -w
use File::Find;
use Digest::MD5 qw(md5_hex);
my %same_sized;
find sub {
return unless -f and my $size = -s _;
push @{$same_sized{$size}}, $File::Find::name;
}, @ARGV;
for (values %same_sized) {
next unless (@ARGV = @$_) > 1;
local $/;
my %md5;
while (<>) {
push @{$md5{md5_hex($_)}}, $ARGV;
}
for (values %md5) {
next unless (my @same = @$_) > 1;
print join(" ", sort @same), "\n";
}
}
| [reply] [d/l] |