in reply to Re: scalable duplicate file remover
in thread scalable duplicate file remover

First of all thank you very much for the critique,it is very well welcomed from my part.
I will use it to improve the program.
1)why do you think the current method of opening the files does not yield correct results ?
(I compared my results of SHA1s against sha1sum unix utilitary and they came out ok,that's
why I'm asking).
2)you are right,I will do this
3)ok I understand,where could I read more about this ?
4)As I read the documentation and thinking that a number in base 10 should always present more
digits than its representation in base 16 I dont understand how it could be shorter in base 10.
I don't get why they say I will get a shorter string in a lower base.


Also they talk about using a single sha1 object and reusing it because of the reset() method that
can clear out the old data from it.
Do you think this will speed up things ?

Replies are listed 'Best First'.
Re^3: scalable duplicate file remover
by jwkrahn (Abbot) on Mar 03, 2008 at 18:18 UTC
    1. From the documentation for Digest::SHA1:

          $sha1->addfile($io_handle)
      [ SNIP ]
              In most cases you want to make sure that the $io_handle is in "binmode" before you pass it as argument to the addfile() method.

    2. OK.    :-)

    3. Typeglobs and Filehandles
      How do I pass filehandles between subroutines?
      How can I use a filehandle indirectly?

    4. $sha1->digest returns a digest in binary form while $sha1->hexdigest is in hexadecimal form. For example:

      $ perl -le'
      my $digest     = "\x02\x07\xFA\x78";
      my $hex_digest = "0207FA78";
      print for length( $digest ), length( $hex_digest );
      '
      4
      8

    5. Update: reset() may or may not speed things up. You would have to compare both methods with Benchmark to be sure.