comment on

Here's how I might have coded this. Disclaimer: I've only done a small amount of testing on this, use at your own risk!

#!/usr/bin/env perl
use warnings;
use strict;
use Time::Piece;
use File::Spec::Functions qw/ no_upwards catfile /;
use Getopt::Long qw/ HelpMessage :config posix_default gnu_compat
    bundling auto_help /;
use Data::Dumper;
$Data::Dumper::Quotekeys=0;
$Data::Dumper::Useqq=$Data::Dumper::Sortkeys=1;

=head1 SYNOPSIS

 myrename.pl [OPTIONS] PATH
 OPTIONS:
   -r | --run        - Actually perform actions
   -v | --verbose    - With --run, report actions
   -q | --quiet      - Suppress warning messages
   -d | --debug      - Enable debugging (overrides -v and -q)

=cut

GetOptions(
    'r|run'      => \( my $RUN     ),
    'v|verbose'  => \( my $VERBOSE ),
    'q|quiet'    => \( my $QUIET   ),
    'd|debug'    => \( my $DEBUG   ),
    version    => sub { print q$myrename.pl v0.01$,"\n"; exit },
    ) or HelpMessage(-exitval=>255);
HelpMessage(-exitval=>255) unless @ARGV==1;
if ( $DEBUG ) { $VERBOSE=1; $QUIET=0; }
my $PATH = $ARGV[0];
print STDERR Data::Dumper->Dump([$PATH],['PATH']) if $DEBUG;

opendir my $dh, $PATH or die "$PATH: $!";
my @FILES = sort grep { -f catfile($PATH,$_) } no_upwards readdir $dh;
closedir $dh;
print STDERR Data::Dumper->Dump([\@FILES],['*FILES']) if $DEBUG;

my %files;
FILE: for my $origfile (@FILES) {
    my ($uid,$time,$file) = $origfile =~ /\A(\d+_)((?:\d+_){6})(.+)\z/
        or do { warn "No match, skipping $origfile\n"
            unless $QUIET; next FILE };
    print STDERR Data::Dumper->Dump([$uid,$time,$file],
        [qw/uid time file/]) if $DEBUG;
    $time = Time::Piece->strptime($time, '%Y_%m_%d_%H_%M_%S_')->epoch;
    push @{ $files{$file} }, { origfile => $origfile, time=>$time };
}
@$_ = sort { $b->{time} <=> $a->{time} } @$_ for values %files;
print STDERR Data::Dumper->Dump([\%files],['*files']) if $DEBUG;

for my $file (sort keys %files) {
    my $keep = shift @{ $files{$file} };
    my $srcfile = catfile($PATH,$keep->{origfile});
    my $dstfile = catfile($PATH,$file);
    print "Rename $srcfile to $dstfile\n" if !$RUN || $VERBOSE;
    die "Destination file exists: $dstfile\n" if -e $dstfile;
    # NOTE: There is a possible race condition between -e and rename
    if ($RUN) {
        rename($srcfile, $dstfile)
            or die "rename($srcfile, $dstfile): $!";
    }
    for my $drop ( @{ $files{$file} } ) {
        my $dropfile = catfile($PATH,$drop->{origfile});
        print "Drop $dropfile\n" if !$RUN || $VERBOSE;
        if ($RUN) {
            unlink($dropfile) or die "unlink($dropfile): $!";
        }
    }
}
warn "This was a dry-run, no actions performed\n" unless $RUN;
[download]

For a set of files ( "2007_5_22_15_34_23_Table_-_2007522_XYZ_W3.pdf", "8_2007_5_22_15_34_23_Table_-_2007522_XYZ_W3.pdf", "8_2007_5_22_15_34_23_Table_-_2008522_XYZ_W3.pdf", "8_2007_5_22_22_34_12_Table_-_2007522_XYZ_W3.pdf" ), the output is:

No match, skipping 2007_5_22_15_34_23_Table_-_2007522_XYZ_W3.pdf
Rename x/8_2007_5_22_22_34_12_Table_-_2007522_XYZ_W3.pdf to x/Table_-_
+2007522_XYZ_W3.pdf
Drop x/8_2007_5_22_15_34_23_Table_-_2007522_XYZ_W3.pdf
Rename x/8_2007_5_22_15_34_23_Table_-_2008522_XYZ_W3.pdf to x/Table_-_
+2008522_XYZ_W3.pdf
[download]

Update: I guess a few words of explanation would be helpful. First off, note that this loads the entire list of files into memory, but with 5k files, I think that should be fine. Most of the first half of the script is just setting up and reading the list of files from the directory. The interesting stuff happens in the %files hash: it's a hash where the keys are the target filenames, and each value is an array of the original filenames, along with their datetimes parsed into UNIX timestamps (use the --debug switch to see the data structures). This allows me to simply sort each list of files (the @$_ = sort ... step) so that the first element of each array is the latest filename. Then, I loop over all the files again, taking the first element of each array as the file to keep and rename to the target filename, and I delete all the other files. I hope this makes sense, and feel free to ask if anything is unclear. (Note I used core modules only.)

In reply to Re: Batch file renaming - on identical name, keep only most recent file, based on dates (updated) by haukex
in thread Batch file renaming - on identical name, keep only most recent file, based on dates by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.