Re: Find duplicate files with exact same files noted

my @file_list; sub files_wanted { my $text = $File::Find::name; if ( -f ) { push @file_list, $text; } } #If you set a base directory above, you will need to change $directory to $base_directory.$directory. find(\&files_wanted,$directory); #This section creates a hash of arrays of files, with the hash keys be +ing filename.ext and the file +size in #parentheses. The raw file name is the entire path including the file +name. my %files; for my $raw_file (@file_list) { my @file_parts = split(/\//,$raw_file); my $file = pop @file_parts; my $file_size = -s $raw_file; push @{$files{"$file ($file_size bytes)"}}, $raw_file; }
[download]

Why tranverse the directory tree twice (and stat each file twice) when you only have to traverse it once:

my %files;
find sub {
    if ( -f ) {
        push @{ $files{ "$_ (" . ( -s _ ) . " bytes)" } }, $File::Find
+::name;
        }
    }, $directory;
[download]

Comment on Re: Find duplicate files with exact same files noted Select or Download Code

Replies are listed 'Best First'.
Re^2: Find duplicate files with exact same files noted by Lady_Aleena (Priest) on Aug 17, 2010 at 19:18 UTC
Wow! I didn't realize that I was traversing the tree twice until you said something. Maybe that is why it took a little while to run. I didn't use your exact suggestion, but I did merge the two pieces into one. This ... `my @file_list; sub files_wanted { my $text = $File::Find::name; if ( -f ) { push @file_list, $text; } } find(\&files_wanted,$directory); my %files; for my $raw_file (@file_list) { my @file_parts = split(/\//,$raw_file); my $file = pop @file_parts; my $file_size = -s $raw_file; push @{$files{"$file ($file_size bytes)"}}, $raw_file; }` [download] .. is now this ... `my %files; sub files_wanted { my $raw_file = $File::Find::name; if ( -f ) { my ($volume,$directories,$file) = File::Spec->splitpath($raw_file) +; #update from a prior suggestion. my $file_size = -s $raw_file; push @{$files{"$file ($file_size bytes)"}}, $raw_file; } } find(\&files_wanted,$directory);` [download] The script now runs a little faster since removing the double traversal of the directory tree. Thanks for showing me what I was really doing! *Have a cookie and a very nice day!* Lady Aleena	[reply] [d/l] [select]
Re^3: Find duplicate files with exact same files noted by jwkrahn (Abbot) on Aug 17, 2010 at 20:31 UTC
`my %files; sub files_wanted { my $raw_file = $File::Find::name; if ( -f ) { my ($volume,$directories,$file) = File::Spec->splitpath($raw_file) +; #update from a prior suggestion. my $file_size = -s $raw_file; push @{$files{"$file ($file_size bytes)"}}, $raw_file; } }` [download] While you are in the "wanted" subroutine that `File::Find::find` runs, the full path is in the `$File::Find::name` variable and the file name only is in the `$_` variable so there is no need to use `File::Spec->splitpath()` to do something that `File::Find::find` has already done for you. Also, you are still using stat on the same file twice when it would be more efficient to do it only once.	[reply] [d/l] [select]