Re: Delete the file with checking the value

#!/usr/bin/perl

my %hash;

while (<DATA>) {
    my ($input_file, $time, $date) = split ' ';

    # determine common part of file names
    my $common = $input_file;
    $common =~ s/^\d+//;   # remove leading number

    # collect info (name, timestamp) in hash, keyed by common part of 
+file names
    push @{$hash{$common}}, [ $input_file, sprintf("%s %6d",$date,$tim
+e) ];
}

# use Data::Dumper;
# print Dumper \%hash;  # debug

for my $k (keys %hash) {  # for all file sets

    # sort by timestamp
    my @files = sort {$b->[1] cmp $a->[1]} @{$hash{$k}};
    
    # remove most recent file (the one to keep) from list
    shift @files;

    # delete remaining (older) files
    unlink map $_->[0], @files;
}

__DATA__
508.ids.xml    70857 2004-10-02
1508.ids.xml   70859 2004-10-02
1509.id123.xml 2000 2004-10-02
1400.id123.xml 4000 2004-10-01
[download]

(I left out the XML stuff, as the OP doesn't seem to have problems with that part.)

Comment on Re: Delete the file with checking the value Download Code

Replies are listed 'Best First'.
Re^2: Delete the file with checking the value by Anonymous Monk on Feb 28, 2010 at 05:40 UTC
Thanks for the help. `__DATA__ 508.ids.xml 70857 2004-10-01 1508.ids.xml 70859 2004-10-01 1509.id123.xml 2000 2004-10-01 1400.id123.xml 4000 2004-10-01` [download] How to compare time if the date is same.	[reply] [d/l]
Re^3: Delete the file with checking the value by almut (Canon) on Feb 28, 2010 at 09:45 UTC
As usual, there are several ways to do it. In the sample code I've taken care of it by tagging the time value onto the end of the date string, aligning it such that the whole string can simply be sorted asciibetically to yield the proper result. Note that space (ASCII 32) orders before digits (ASCII 48..57). This is done with the `sprintf("%s %6d",$date,$time)`. With the following sample input `__DATA__ 0.ids.xml 500 2004-10-01 1.ids.xml 2 2004-10-01 2.ids.xml 30 2004-10-01 3.ids.xml 600 2004-10-01 4.ids.xml 40 2004-10-01 5.ids.xml 7000 2004-10-01 6.ids.xml 8000 2004-10-01 7.ids.xml 1 2004-10-01 8.ids.xml 100000 2004-10-01 9.ids.xml 90000 2004-10-01` [download] and the comparison operation as shown — `$b->[1] cmp $a->[1]` (string sort, reversed) — this would order as `2004-10-01 100000 2004-10-01 90000 2004-10-01 8000 2004-10-01 7000 2004-10-01 600 2004-10-01 500 2004-10-01 40 2004-10-01 30 2004-10-01 2 2004-10-01 1` [download] i.e. you get the entry with the highest time value as the first element. Another way would be to store the date and time values separately `push @{$hash{$common}}, [ $input_file, $date, $time ];` [download] and then use a generic chained sort operation `@files = sort {$b->[1] cmp $a->[1] \|\| $b->[2] <=> $a->[2]} @{$hash +{$k}};` [download] This works because if the date value is equal, the first comparison (`$b->[1] cmp $a->[1]`) evaluates to zero, so the next comparison (`$b->[2] <=> $a->[2]`) after the logical or "`\|\|`" is tested to determine if the time differs (it kind of "falls through"). Note that in this case the time value must be compared numerically, i.e. with `<=>`, or else (with string comparison `cmp`) the `100000` would be ordered in between `1` and `2`. See sort.	[reply] [d/l] [select]