Re^2: scalable duplicate file remover

First of all thank you very much for the critique,it is very well welcomed from my part.
I will use it to improve the program.
1)why do you think the current method of opening the files does not yield correct results ?
(I compared my results of SHA1s against sha1sum unix utilitary and they came out ok,that's
why I'm asking).
2)you are right,I will do this
3)ok I understand,where could I read more about this ?
4)As I read the documentation and thinking that a number in base 10 should always present more
digits than its representation in base 16 I dont understand how it could be shorter in base 10.
I don't get why they say I will get a shorter string in a lower base.

Also they talk about using a single sha1 object and reusing it because of the reset() method that
can clear out the old data from it.
Do you think this will speed up things ?

Comment on Re^2: scalable duplicate file remover

Replies are listed 'Best First'.
Re^3: scalable duplicate file remover by jwkrahn (Abbot) on Mar 03, 2008 at 18:18 UTC
From the documentation for Digest::SHA1: `$sha1->addfile($io_handle)` `[ SNIP ]` `In most cases you want to make sure that the $io_handle is in "binmode" before you pass it as argument to the addfile() method.` OK. `:-)` Typeglobs and Filehandles How do I pass filehandles between subroutines? How can I use a filehandle indirectly? `$sha1->digest` returns a digest in binary form while `$sha1->hexdigest` is in hexadecimal form. For example: `$ perl -le'` `my $digest = "\x02\x07\xFA\x78";` `my $hex_digest = "0207FA78";` `print for length( $digest ), length( $hex_digest );` `'` `4` `8` Update: `reset()` may or may not speed things up. You would have to compare both methods with Benchmark to be sure.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: scalable duplicate file remover
by jwkrahn (Abbot) on Mar 03, 2008 at 18:18 UTC

From the documentation for Digest::SHA1:

$sha1->addfile($io_handle)
[ SNIP ]
In most cases you want to make sure that the $io_handle is in "binmode" before you pass it as argument to the addfile() method.
OK. :-)
Typeglobs and Filehandles
How do I pass filehandles between subroutines?
How can I use a filehandle indirectly?
$sha1->digest returns a digest in binary form while $sha1->hexdigest is in hexadecimal form. For example:

$ perl -le'
my $digest = "\x02\x07\xFA\x78";
my $hex_digest = "0207FA78";
print for length( $digest ), length( $hex_digest );
'
4
8
Update: reset() may or may not speed things up. You would have to compare both methods with Benchmark to be sure.

[reply]
[d/l]
[select]