in reply to Delete the file with checking the value

#!/usr/bin/perl my %hash; while (<DATA>) { my ($input_file, $time, $date) = split ' '; # determine common part of file names my $common = $input_file; $common =~ s/^\d+//; # remove leading number # collect info (name, timestamp) in hash, keyed by common part of +file names push @{$hash{$common}}, [ $input_file, sprintf("%s %6d",$date,$tim +e) ]; } # use Data::Dumper; # print Dumper \%hash; # debug for my $k (keys %hash) { # for all file sets # sort by timestamp my @files = sort {$b->[1] cmp $a->[1]} @{$hash{$k}}; # remove most recent file (the one to keep) from list shift @files; # delete remaining (older) files unlink map $_->[0], @files; } __DATA__ 508.ids.xml 70857 2004-10-02 1508.ids.xml 70859 2004-10-02 1509.id123.xml 2000 2004-10-02 1400.id123.xml 4000 2004-10-01

(I left out the XML stuff, as the OP doesn't seem to have problems with that part.)

Replies are listed 'Best First'.
Re^2: Delete the file with checking the value
by Anonymous Monk on Feb 28, 2010 at 05:40 UTC
    Thanks for the help.
    __DATA__ 508.ids.xml 70857 2004-10-01 1508.ids.xml 70859 2004-10-01 1509.id123.xml 2000 2004-10-01 1400.id123.xml 4000 2004-10-01
    How to compare time if the date is same.

      As usual, there are several ways to do it. In the sample code I've taken care of it by tagging the time value onto the end of the date string, aligning it such that the whole string can simply be sorted asciibetically to yield the proper result. Note that space (ASCII 32) orders before digits (ASCII 48..57).  This is done with the sprintf("%s %6d",$date,$time).

      With the following sample input

      __DATA__ 0.ids.xml 500 2004-10-01 1.ids.xml 2 2004-10-01 2.ids.xml 30 2004-10-01 3.ids.xml 600 2004-10-01 4.ids.xml 40 2004-10-01 5.ids.xml 7000 2004-10-01 6.ids.xml 8000 2004-10-01 7.ids.xml 1 2004-10-01 8.ids.xml 100000 2004-10-01 9.ids.xml 90000 2004-10-01

      and the comparison operation as shown — $b->[1] cmp $a->[1] (string sort, reversed) — this would order as

      2004-10-01 100000 2004-10-01 90000 2004-10-01 8000 2004-10-01 7000 2004-10-01 600 2004-10-01 500 2004-10-01 40 2004-10-01 30 2004-10-01 2 2004-10-01 1

      i.e. you get the entry with the highest time value as the first element.

      Another way would be to store the date and time values separately

      push @{$hash{$common}}, [ $input_file, $date, $time ];

      and then use a generic chained sort operation

      @files = sort {$b->[1] cmp $a->[1] || $b->[2] <=> $a->[2]} @{$hash +{$k}};

      This works because if the date value is equal, the first comparison ($b->[1] cmp $a->[1]) evaluates to zero, so the next comparison ($b->[2] <=> $a->[2]) after the logical or "||" is tested to determine if the time differs (it kind of "falls through"). Note that in this case the time value must be compared numerically, i.e. with <=>, or else (with string comparison cmp) the 100000 would be ordered in between 1 and 2.   See sort.