DarthFredd has asked for the wisdom of the Perl Monks concerning the following question:

I built a script which looks for files which share the same name but different extensions, and deletes dupes. I would like some advice on best programming practices.

If foo.csc, foo.txt, and bar.txt are in the same directory, and $ext is set to csc, foo.txt will be deleted (it has a csc copy), but bar.txt will be left alone. My script:
use File::Find; my $ext = "dat"; my $nom = 0; find ( \&ripper, @ARGV) ; sub ripper { ## First implementation unless(s/\.$ext$//i) {return;} my $file = $_; #using $_ screws up grep $nom += unlink(grep(/\Q$wang\E\.(?!$ext)/, glob("*")),"\n"); } sub ripper2 { ## Second implementation my $file = $_; return unless(s/\.(?!$ext)$//i); my $priority = $_ . $ext; if(-e $priority && !-d $priority) { $nom += unlink $file; } }
I wrote the second implementation because I thought it would be cleaner. Is it? Given the choice, which one would you use? Every opinion shall be gratefully recieved.

Replies are listed 'Best First'.
Re: Best programming practice
by Sidhekin (Priest) on Oct 28, 2007 at 22:15 UTC

    The second feels the cleaner approach to me as well, but it is not correctly implemented. So I suppose one could argue it is too clever. Alternatively, one could fix it:

    sub ripper2b { ## Second implementation, take b my $file = $_; return unless s/\.(?!$ext)\w+$//i; my $priority = "$_.$ext"; if (-e $priority && !-d $priority) { $nom += unlink $file; } }
    My two fixes:
    • (?!$ext) is zero-width, so (?!$ext)$ is just $. We need to remove the extension as well, whatever it is, so add \w+ to the regex.
    • The "." was removed by the substitution, so we need to add it again along with $ext.

    If that does not make sense to you, the second implementation is just too clever. Go with the first.

    print "Just another Perl ${\(trickster and hacker)},"
    The Sidhekin proves Sidhe did it!

      I think you went the right way. The 2nd one is much closer to working than the first. I'm not even entirely sure what $wang was supposed to hold. Also, I would have missed the \w+.
      -- I used to drive a Heisenbergmobile, but everyone I looked at the speedometer, I got lost.
Re: Best programming practice
by jdporter (Paladin) on Oct 29, 2007 at 13:06 UTC

    This is how I'd approach it:

    use Getopt::Long; use File::Glob; use strict; use warnings; my $ext = 'csc'; GetOptions( 'ext=s' => \$ext, ); my $unl; for my $dir ( @ARGV ) { my %f; /(.*)\.(.*)/ and $f{$1}{$2} = $_ for File::Glob::bsd_glob( "$dir/*" ); for my $g ( grep { $f{$_}{$ext} and keys %{$f{$_}} > 1 } keys %f ) { delete $f{$g}{$ext}; $unl += unlink $f{$g}{$_} for sort keys %{ $f{$g} }; } } print "\n$unl files unlinked.\n";
    A word spoken in Mind will reach its own level, in the objective world, by its own weight
Re: Best programming practice
by tuxz0r (Pilgrim) on Oct 30, 2007 at 03:59 UTC
    Just curious, but I couldn't get either of your subroutines working. But, giving it some thought since I saw the post originally, I put together one which works with the semantics you mentioned, leaving unique filenames alone (ones with no dups) and removing duplicates of files with the same prefix (minus extension), saving the one prefix you pass in. Is this along the lines of what you're trying to do?
    my $working_dir = $ARGV[0]; # starting directory my $extension = $ARGV[1]; # extension to save dedup($working_dir); exit 0; sub dedup { my $path = shift; my @files = glob("$path/*"); print "Checking [$path] ...\n"; foreach (@files) { dedup("$_") if (-d $_ && -x $_); # recurse into subdire +ctories my ($base, $ext) = m/(.*)\.([^\.]+)$/; my @matches = glob("$base*"); print "\tremoving: $_\n" if scalar @matches > 1 && $ext ne $ex +tension; # unlink $_ if scalar @matches > 1 && $ext ne $exte +nsion; } }

    I'm not using File::Find here, but it does work recursively through accessible subdirectories. It only looks at dups within the same directory level, so if you have foo.* in multiple directories they won't be affected.

    This is a quick and dirty script, so it could be much better with some argument checking, adding an appropriate way to handle "." dot files and the like; but, it seems to do a fairly good job.

    If you can tell me what I might have missed running your code, let me know, I'd love to give it a run and compare the output.

    ---
    echo S 1 [ Y V U | perl -ane 'print reverse map { $_ = chr(ord($_)-1) } @F;'