in reply to Identical Files to Symbolic Links
while(@fnames) { my $f = shift @fnames; for my $f2 (@fnames) { last if($size{$f} != $size{$f2});
That's O(n2)! You'd get a significant boost by making a set of lists of files with the same size:
my %filesets; for ( @fnames ) { push @{ $filesets{ -s $_ } }, $_; } # filter out any lists that have only one element: for ( keys %filesets ) { @{$filesets{$_}} <= 1 and delete $filesets{$_}; }
You could use checksums to get each list down to a set of "highly likely" candidate duplicates:
my %filesets; for ( @fnames ) { my $size = -s $_; my $csum = `sum "$_"`; push @{ $filesets{$size.$csum} }, $_; }
But you'd probably still want to do actual file comparisons (`cmp`) to ensure actual duplicates.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Identical Files to Symbolic Links
by Aristotle (Chancellor) on Nov 10, 2005 at 02:57 UTC | |
by jdporter (Paladin) on Nov 10, 2005 at 03:45 UTC | |
by Aristotle (Chancellor) on Nov 10, 2005 at 04:01 UTC | |
by jdporter (Paladin) on Nov 10, 2005 at 15:38 UTC | |
by blazar (Canon) on Nov 10, 2005 at 13:36 UTC |