leoberbert has asked for the wisdom of the Perl Monks concerning the following question:

Dear, Good evening! I am facing a problem in perl. I have the following files in a directory, where the delimiter is "_" and the third field is the dataa where the file was created. Now I need to always pick the file that has the earliest date in accordance with the first column code. Example: 1020300000_XXXXXXXXX_20160707193000.TXT 1020300000_XXXXXXXXX_20160707170000.TXT 1020400000_XXXXXXXXX_20160707180000.TXT 1020400000_XXXXXXXXX_20160707190000.TXT In this case I need to have as a result the older files. 1020300000_XXXXXXXXX_20160707193000.TXT 1020400000_XXXXXXXXX_20160707180000.TXT Someone could help me return only older files? Regards,

Replies are listed 'Best First'.
Re: Sort files by date in the file name.
by Marshall (Canon) on Jul 08, 2016 at 07:34 UTC
    I this what you needed? I'm not sure?

    Update: I saw the post from johngg at Re^7: Sort files by date in the file name.. I had completely missed the fact that the date format is different in this second data set. If this is the format, then a simple numeric date comparison won't work and something more involved is required. However, when I see at date like 28062017102508, in 2017, that makes me think that this data was hand generated. Note to OP when you post example data, make it as close to the "real thing" as possible. These details matter.

    Also, the sort method I show below is basic. Although straight forward, it has a built in inefficiency in that the the splits are run each time a pair of data in $a,$b is selected by the sort algorithm. The Schwartzian Transform as shown by kcott avoids this by calculating the split for all the data at one time. So it is faster, but at the cost of more complexity. I recommend to a beginner to master the basics first, then get fancy once you have a solid foundation.

    #!/usr/bin/perl use strict; use warnings; my @files; while (<DATA>) { chomp; push @files, $_; } @files = sort by_first_third @files; sub by_first_third { my ($Afirst,$Athird) = (split '_|\.',$a)[0,2]; my ($Bfirst,$Bthird) = (split '_|\.',$b)[0,2]; $Afirst <=> $Bfirst or $Athird <=> $Bthird } print join ("\n",@files), "\n\n"; print "first file: $files[0]", "\n"; print "last file : $files[-1]", "\n"; =prints 971305332_XXXXXX12345678765463565E_28062011102508.TXT 971305332_AAAAAAAA12345678765463565E_28062011102508.TXT 971305332_CCCC12345678765463565E_28062011102508.TXT 971305332_CCCC12345678765463565E_28062020102508.TXT 981139804_ABCDEF12345678765463565E_28062016102508.TXT 981139804_ABCDEF12345678765463565E_28062016112508.TXT 981139804_ABCDEF12345678765463565E_28062016172508.TXT 981139804_ABCDEF12345678765463565E_28062017102508.TXT first file: 971305332_XXXXXX12345678765463565E_28062011102508.TXT last file : 981139804_ABCDEF12345678765463565E_28062017102508.TXT =cut __DATA__ 981139804_ABCDEF12345678765463565E_28062016172508.TXT 981139804_ABCDEF12345678765463565E_28062016112508.TXT 981139804_ABCDEF12345678765463565E_28062016102508.TXT 981139804_ABCDEF12345678765463565E_28062017102508.TXT 971305332_XXXXXX12345678765463565E_28062011102508.TXT 971305332_AAAAAAAA12345678765463565E_28062011102508.TXT 971305332_CCCC12345678765463565E_28062011102508.TXT 971305332_CCCC12345678765463565E_28062020102508.TXT
Re: Sort files by date in the file name.
by kcott (Archbishop) on Jul 08, 2016 at 08:43 UTC

    G'day leoberbert,

    I think this does what you want:

    #!/usr/bin/env perl -l use strict; use warnings; my @files = qw{ 1020300000_XXXXXXXXX_20160707193000.TXT 1020300000_XXXXXXXXX_20160707170000.TXT 1020400000_XXXXXXXXX_20160707180000.TXT 1020400000_XXXXXXXXX_20160707190000.TXT }; my @sorted = map { $_->[0] } sort { $a->[1] <=> $b->[1] || $a->[3] <=> $b->[3] } map { [ $_ => split /[_.]/ ] } @files; print join "\n", '@files:', @files; print join "\n", '@sorted:', @sorted;

    Output:

    @files: 1020300000_XXXXXXXXX_20160707193000.TXT 1020300000_XXXXXXXXX_20160707170000.TXT 1020400000_XXXXXXXXX_20160707180000.TXT 1020400000_XXXXXXXXX_20160707190000.TXT @sorted: 1020300000_XXXXXXXXX_20160707170000.TXT 1020300000_XXXXXXXXX_20160707193000.TXT 1020400000_XXXXXXXXX_20160707180000.TXT 1020400000_XXXXXXXXX_20160707190000.TXT

    The construct I've used here is known as a Schwartzian Transform.

    This sorts first on column1 and then on column3, both in ascending order; this should put your wanted file in $sorted[0]. If that's not what you want, modify the sort { ... } code to suit.

    — Ken

Re: Sort files by date in the file name.
by johngg (Canon) on Jul 08, 2016 at 16:46 UTC

    The following code finds the oldest file for each "first column" set, if there are multiple oldest files it will display only the first in lexical order. It is not clear whether the date format is YYYYMMDD as in the OP or DDMMYYYY as per the later posted data; I have gone with the latter.

    johngg@shiraz:~/perl/Monks > perl -Mstrict -Mwarnings -E ' my @files = qw{ 981139804_ABCDEF12345678765463565E_28062016172508.TXT 981139804_ABCDEF12345678765463565E_28062016192508.TXT 981139804_ABCDEF12345678765463565E_28062016112508.TXT 981139804_ABCDEF12345678765463565E_28062016102508.TXT 981139804_ABCDEF12345678765463565E_28062017102508.TXT 971305332_XXXXXX12345678765463565E_28062011102508.TXT 971305332_AAAAAAAA12345678765463565E_28062011102508.TXT 971305332_CCCC12345678765463565E_28062011102508.TXT 971305332_CCCC12345678765463565E_28062020102508.TXT }; my @oldest = do { my %seen; grep { not $seen{ ( split m{_} )[ 0 ] } ++ } map { substr $_, 21 } sort map { my( $col1, $dt ) = ( split m{[_.]} )[ 0, 2 ]; my( $d, $m, $y, $t ) = unpack q{a2 a2 a4 a6}, $dt; pack q{a9 a4 a2 a2 a6 a*}, $col1, $y, $m, $d, $t, $_; } @files; }; say for @oldest;' 08971305332_AAAAAAAA12345678765463565E_28062011102508.TXT 08981139804_ABCDEF12345678765463565E_28062016102508.TXT

    I hope this guess at your intent is correct.

    Cheers,

    JohnGG

Re: Sort files by date in the file name.
by Anonymous Monk on Jul 07, 2016 at 23:38 UTC

    What is first column code ?

      A number that must be used as a key ..... The first and last column should be the sort keys.

        A number that must be used as a key ..... The first and last column should be the sort keys.

        :) you gave sample input and sample output

        the sample output does not appear in the sample input

        its not clear what you mean by "first column code" and I cannot deduce/guess from the sample you provided

        The word explanations do not clarify which part is which