in reply to Re^2: scalable duplicate file remover
in thread scalable duplicate file remover

  1. From the documentation for Digest::SHA1:

        $sha1->addfile($io_handle)
    [ SNIP ]
            In most cases you want to make sure that the $io_handle is in "binmode" before you pass it as argument to the addfile() method.

  2. OK.    :-)

  3. Typeglobs and Filehandles
    How do I pass filehandles between subroutines?
    How can I use a filehandle indirectly?

  4. $sha1->digest returns a digest in binary form while $sha1->hexdigest is in hexadecimal form. For example:

    $ perl -le'
    my $digest     = "\x02\x07\xFA\x78";
    my $hex_digest = "0207FA78";
    print for length( $digest ), length( $hex_digest );
    '
    4
    8

  5. Update: reset() may or may not speed things up. You would have to compare both methods with Benchmark to be sure.