This is my first post here, so please feel free to redirect this to any other section if this is not the place where it belongs.
I am posting this little script seeking for your opinions in every aspect: design, layout, readability, speed, etc. It uses File::Find::Duplicates to find duplicate files recursively in a directory and, instead of just informing about them or deleting them, it creates hardlinks so that the disk space is freed but the files do remain. I wrote it to practice some of the things that I'm trying to learn, but I found it quite useful for my /home directory (I could free 2 GB!).
I was also pleasantly surprised that it is quite fast. I haven't benchmarked it (I haven't read the Benchmark documentation yet), but it is sensibly faster than, for example, the fdupes program that comes with Ubuntu (and probably other Debian-based distros).
Of course the merit of this goes entirely to Tony Bowden, the author of the module.
Here's the code for it:
#!/usr/bin/perl -w use strict; use File::Find::Duplicates; use File::Temp (); my %stats = ( files_linked => 0, space_saved => 0 ); local $" = "\n"; # Read directory from command line, or default to current. my $directory = $ARGV[0] || "."; # Find duplicates recursively in such directory my @dupes = find_duplicate_files($directory); # For each set of duplicate files, create the hardlinks and save the # information in the stats hash foreach my $set (@dupes) { print $set->size, " bytes each:\n", "@{ $set->files }\n"; my $original = shift @{ $set->files }; my $number_linked = fuse( $original, \@{ $set->files } ); $stats{files_linked} += $number_linked; $stats{space_saved} += $number_linked * $set->size; } # Report the stats print "Files linked: $stats{ files_linked }\n"; print "Space saved: $stats{ space_saved } bytes\n"; sub fuse { # Replace duplicates with hard links and return the number # of links created. my $original = shift; my $duplicates = shift; my $files_linked; foreach my $duplicate (@$duplicates) { # Step 1: link original to tempfile my $tempfile = File::Temp::tempnam( $directory, 'X' x 6 ); link $original, $tempfile or next; # Step 2: move tempfile to duplicate unless ( rename $tempfile, $duplicate ) { next; } if ( -e $tempfile ) { unlink $tempfile or die "Couldn't delete temporary file $tempfile: $!"; } ++$files_linked; } return $files_linked; }
Update: Subrutine fuse() changed following betterworld's suggestion.
In reply to Replace duplicate files with hardlinks by bruno
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |