in reply to Re^2: Sort directory by file size
in thread Sort directory by file size
2. Since you want to use file size to determine when to do md5 checksums, I think it would make more sense to build of a hash of arrays keyed by byte count: for each distinct byte count, the hash key is the size and the hash value is an array holding files of that size. Then loop over the hash and do md5s for each set of two or more files with a given size. You don't really need to do any sorting - just keep track of the different sizes. Here's how I would do it (on a unix/linux system):
(That just lists sets of files that have identical content; you can tweak it do to other things, as you see fit.)#!/usr/bin/perl use strict; use warnings; use Digest::MD5; die "Usage: $0 dir1 dir2\n" unless ( @ARGV == 2 and -d $ARGV[0] and -d $ARGV[1] ); my %fsize; for my $dir ( @ARGV ) { opendir DIR, $dir or die "$dir: $!\n"; while ( my $fn = readdir DIR ) { next unless -f "$dir/$fn"; push @{$fsize{ -s "$dir/$fn" }}, "$dir/$fn"; } } my %fmd5; my $digest = Digest::MD5->new; for my $bc ( keys %fsize ) { next if scalar @{$fsize{$bc}} == 1; for my $fn ( @{$fsize{$bc}} ) { if ( open( my $fh, "<", $fn )) { $digest->new; $digest->addfile( $fh ); push @{$fmd5{ $digest->b64digest }}, $fn; } } } for my $md ( keys %fmd5 ) { print join( " == ", @{$fmd5{$md}} )."\n" if ( scalar @{$fmd5{$md}} + > 1 ); }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Sort directory by file size
by nnigam1 (Novice) on May 19, 2016 at 19:27 UTC |