See md5 sum different on windows and unix for win.exe files !!? for how to use perl's md5sum, to avoid forking altogether. If I was doing a program that needed md5sums on thousands of files, I probably would try to setup a permanent worker thread, that would take a file and return it's sum, without forking; like try to setup the c binary in the thread with IPC, and print files to it's STDIN with the '-' option, then collect output thru IPC. That way you are only starting one fork in the child thread....possibly have multiple summing threads to speed it up.

If you wanted to loop thru all files for duplicates, you could use File::Find to loop thru them, and store the md5sum in a hash.

#!/usr/bin/perl -w use File::Find; use Digest::MD5 qw(md5_hex); my %same_sized; find sub { return unless -f and my $size = -s _; push @{$same_sized{$size}}, $File::Find::name; }, @ARGV; for (values %same_sized) { next unless (@ARGV = @$_) > 1; local $/; my %md5; while (<>) { push @{$md5{md5_hex($_)}}, $ARGV; } for (values %md5) { next unless (my @same = @$_) > 1; print join(" ", sort @same), "\n"; } }

I'm not really a human, but I play one on earth.
Old Perl Programmer Haiku

In reply to Re: md5sum for each files costly ?! by zentara
in thread md5sum for each files costly ?! by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.