Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks!

I have a directory /mydir with a bunch of directories created every month as:

12112014
01052015
02202015
03102015
01012011
04092015
09092015

I am trying to find what directory is the latest and open that directory, what should be the best way to do something like that?
sub direc { ... my $dir = "/my_dir"; opendir DIR, $dir; # Read in all directories in /my_dir first my @month_dir = grep /^\d{8}$/, readdir(DIR); my $current_dir = $month_dir[0] || ''; # I need to open only the latest directory here to continue # where I am stuck my $latest_dir = ... # Now read in all files in latest directory found opendir LATESTDIR, "$dir/$latest_dir"; # Read in all files, but ignore '.' and '..' my @files = grep !/^\.{1,2}$/, readdir(LATESTDIR); my @files; foreach my $file (@files) { # I only want txt next unless ($file =~ m/\.txt$/); push @files, "$file"; } closedir(LATESTDIR); closedir(DIR); return \@files; }

Thanks for the help!

Replies are listed 'Best First'.
Re: Sort directories by date
by Corion (Patriarch) on Sep 28, 2015 at 17:49 UTC
Re: Sort directories by date
by graff (Chancellor) on Sep 29, 2015 at 03:53 UTC
    Believe it or not, your question may actually be ambiguous. The "latest" directory could either be the one whose name represents the most recent month-day-year date, OR the one whose modification date is the most recent (i.e. the one that has most recently had a file added, deleted, renamed, etc.). The other replies above all take it for granted that the first of these two was your intended meaning.

    But maybe those two possible interpretations happen to yield the same relative ordering of directories - that is, it could be that no changes ever occur in the contents of "09302015" once "10012015" is created (and likewise for the latter, once "10022015" is created). If this is true, you can sort the directories based on their modification dates:

    my $dir = "/my_dir"; opendir DIR, $dir or die "$dir: $!\n"; my %subdir; while ( my $month_dir = readdir( DIR ) { next unless ( $month_dir =~ /^\d{8}$/ ); $subdir{$month_dir} = -M $month_dir; } my $latest_dir = ( sort {$subdir{$a}<=>$subdir{$b}} keys %subdir )[0]; # ...
    Of course, if you really, truly meant for the actual directory names to be the basis for sorting, then the Schwartzian Transform is probably the most economical: instead of using the hash and while loop shown above, just do this:
    my $latest_dir = ( map{s/(....)(....)/$2$1/; $_} sort map{s/(....)(....)/$2$1/; $_} grep /^\d{8}$/, readdir DIR )[-1];
Re: Sort directories by date
by poj (Abbot) on Sep 28, 2015 at 18:52 UTC
    #!perl use strict; sub direc { my $dir = "/my_dir"; opendir DIR, $dir; # Read in all directories in /my_dir first my @month_dir = grep /^\d{8}$/, readdir(DIR); my %ymd = map {/(\d{4})(\d{4})/;$2.$1,$_} @month_dir ; my $latest_ymd = (sort keys %ymd)[-1]; my $latest_dir = $ymd{$latest_ymd}; closedir(DIR); # Now read in .txt files in latest directory found opendir LATESTDIR, "$dir/$latest_dir"; my @files = grep /\.txt$/, readdir(LATESTDIR); closedir(LATESTDIR); return \@files; }
    poj
Re: Sort directories by date
by RichardK (Parson) on Sep 28, 2015 at 18:35 UTC

    It would have been much easier if you'd created the directory names as 'YYYYMMDD' then they would have sorted naturally with little effort.

    BTW, opendir/readdir are too low level and always a pain to use unless you really have to. You could try File::Find::Rule instead

    use v5.20; use warnings; use File::Find::Rule; my @dirs = File::Find::Rule->directory()->maxdepth(1)->in('.'); say $_ for @dirs;
Re: Sort directories by date
by Laurent_R (Canon) on Sep 29, 2015 at 10:30 UTC
    Although it may not matter that much (depending on how many subdirectories you have in your root directory), sorting the whole list just to get the most recent item is somewhat inefficient, even with a fast sorting algorithm and using Schwartzian transform or Guttman-Rosler transform, because it requires the computer to do much more work than what is actually needed.

    I usually would not care that much about that with a short list of subdirectories, but it sometimes matter that there are more efficient algorithms to pick up the latest (or largest, or smallest, whatever) element in a list.

    For example, something like this at the command line (quick test):

    $ perl -e ' > my @list = qw/12112014 > 01052015 > 02202015 > 03102015 > 01012011 > 10102014 > 04092015 > 09092015 > 09092013/; > chomp @list; > my $max_y = "0000"; > my $max_d = "0000"; > for my $dir (@list) { > my ($d, $y) = $dir =~ /(\d{4})(\d{4})/; > if ($y > $max_y) { > $max_y = $y; > } elsif ($y == $max_y) { > $max_d = $d if $d > $max_d; > } > } > print "$max_d$max_y\n"; > ' 09092015
    This may look slightly more complex, but it is more efficient for a long list of directories. Which is why I would care only if the list is long.

      It just occurred to me that a GRTish, guaranteed-single-pass solution is possible:

      c:\@Work\Perl\monks>perl -wMstrict -le "use List::Util qw(maxstr); ;; my @dates = qw( 12112014 12012014 01052015 12202014 12022014 02202015 03102015 01012011 09092015 04092015 ); ;; my $most_recent = unpack 'x4a*', maxstr map pack('a4a*', unpack('x4a4', $_), $_), @dates ; ;; print $most_recent; " 09092015
      Use  minstr for least-recent date. See the core module List::Util. No efficiency/performance testing done nor claims made. (Update: Actually, I'd be surprised if there's any advantage unless you're dealing with really large lists; default sort (with no subroutine block) is pretty fast!)

      Update: See also pack, unpack, perlpacktut.


      Give a man a fish:  <%-{-{-{-<

        Yes, it is a fairly nice way of doing it.++

        I had also been thinking about some similar form of GR-like transform (though I had not thought about using the maxstr function of List::Until), but I finally preferred to make my algorithmic point with a simple basic straight-forward and manual search of the maximum date.

        I also agree that the various ways of doing that have little consequence on performance unless the list is really very long.

Re: Sort directories by date
by karlgoethebier (Abbot) on Sep 29, 2015 at 18:25 UTC

    As RichardK wrote above:

    "...It would have been much easier if you'd created the directory names as 'YYYYMMDD' then they would have sorted naturally with little effort...."

    But i guess that it is like it is: You can't change that any more.

    Here is another idea how to sort these directories that doesn't require any dark klingone maneuvers ;-)

    #!/usr/bin/env perl use strict; use warnings; use feature qw (say); my @dates = qw{ 12112014 01052015 02202015 03102015 01012011 04092015 09092015 }; my %hash; for (@dates) { my ( $m, $d, $y ) = unpack q(a2a2a4); $hash{qq($y$m$d)} = $_; } for ( sort { $a <=> $b } keys(%hash) ) { say $hash{$_}; } __END__ karls-mac-mini:monks karl$ ./dirs.pl 01012011 12112014 01052015 02202015 03102015 04092015 09092015

    Please see also Path::Iterator::Rule, File::Basename, unpack, pack as well as perlpacktut.

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

      Yeah, but why would you want to populate a hash and use sort, when you already have everything at hand and only need one very small extra step to find the max value?
      #!/usr/bin/env perl use strict; use warnings; use feature qw (say); my @dates = qw{ 12112014 01052015 02202015 03102015 01012011 04092015 09092015 }; my $max_date = 0; my $result; for (@dates) { my $date = join "", reverse unpack q(a4a4); $result = $_ and $max_date = $date if $date > $max_date; } print $result;
      Update: fixed a mistake (missing reverse) in the my $date = ... code line above. Thanks to poj for pointing out the error.
        "...why would you want to populate a hash and use sort..."

        Because i had no better idea :-(

        Update: No, wait:

        use strict; use warnings; use feature qw (say); use List::Util qw(max); use Time::Piece; my @dates = qw{ 12112014 01052015 02202015 03102015 01012011 04092015 09092015 }; say localtime( max map { Time::Piece->strptime( $_, "%m%d%Y" )->epoch +} @dates ) ->strftime("%m%d%Y"); __END__ \Desktop\monks>other_idea.pl 09092015

        Best regards, Karl

        «The Crux of the Biscuit is the Apostrophe»

Re: Sort directories by date
by Anonymous Monk on Sep 28, 2015 at 22:50 UTC
    #!/usr/bin/perl # http://perlmonks.org/?node_id=1143273 use strict; use warnings; my @dirs = qw( 12112014 01052015 02202015 03102015 01012011 04092015 09092015 ); my $latest = (sort { $a % 1e4 <=> $b % 1e4 } sort @dirs)[-1]; print "latest dir $latest\n";

    Because sort is stable :)

      The rest of your sub is much simpler.

      return [ glob "/my_dir/$latest/*.txt" ]; }

      (untested)

        Sort of like this...

        sub direc { my $dir = "/my_dir"; my @dirs = map m[/(\d{8})\z], glob "/$my_dir/*"; my $latest = (sort { $a % 1e4 <=> $b % 1e4 } sort @dirs)[-1]; [ glob "/my_dir/$latest/*.txt" ]; }

        (untested)

Re: Sort directories by date
by locked_user sundialsvc4 (Abbot) on Sep 28, 2015 at 19:58 UTC

    Or the short-answer, for those who don’t want to answer a quiz to get it, would be ... to use a sort-compare subroutine, inline or otherwise, along these lines:   (untested)

    sort { substr($a, 4, 4) cmp substr($b, 4, 4) || substr($a, 2, 2) cmp substr($b, 2, 2) || substr($a, 0, 2) cmp substr($b, 0, 2) } ...

    The sort verb accepts as its first argument a function that, given two “magic variables” $a and $b, must return a value that is less than, equal to, or greater than zero.   The <=> (numeric) and cmp (string) operators are specifically designed for this purpose.   Here, in a simple in-line subroutine, we use the || logic-OR operator, which we know uses “short circuiting,” to return the first of three alternatives that is not zero.   First, we compare the year.   Then, the month, then the day.   (The first position in a Perl string is position zero.)

      Unfortunately, you have your month and day fields transposed as the third date in the list shows that the format is MMDDYYYY, there not being a month 20 in the year!

      If there are many directories to sort there might be some benefit in using a more advanced sorting approach rather than repeatedly substr'inging the same fields as each date is compared with others in turn. I have used unpack as an alternative to substr in the following code. A Schwartzian transform:-

      $ perl -Mstrict -Mwarnings -E ' my @dates = qw{ 12112014 01052015 02202015 03102015 01012011 04092015 09092015 }; say for map { $_->[ 0 ] } sort { $a->[ 3 ] <=> $b->[ 3 ] || $a->[ 1 ] <=> $b->[ 1 ] || $a->[ 2 ] <=> $b->[ 2 ] } map { [ $_, unpack q{a2a2a4}, $_ ] } @dates;' 01012011 12112014 01052015 02202015 03102015 04092015 09092015 $

      Guttman Rosler transform:-

      $ perl -Mstrict -Mwarnings -E ' my @dates = qw{ 12112014 01052015 02202015 03102015 01012011 04092015 09092015 }; say for map { substr $_, 8 } sort map { join q{}, ( unpack q{a2a2a4}, $_ )[ 2, 0, 1 ], $_ } @dates;' 01012011 12112014 01052015 02202015 03102015 04092015 09092015 $

      I hope this is of interest.

      Cheers,

      JohnGG