moesplace has asked for the wisdom of the Perl Monks concerning the following question:

I have Solaris 10, ksh88, perl 5.8.4 and am wanting to process shell commands inside of date-named directories.

20110219 20110220 20110221 20110222 20110223 20110224 20110225 20110226 20110227 20110228 20110301 20110302 20110303

Many, many more directories exist. Requirements: Take start date, end date, and process (from end date to start date backwards) through all directories in that range. Wondering what the best way to cd /dir/archive/20110303 issue a shell command, check for certain other conditions, and then go backwards and cd /dir/archive/20110302, issue same shell command, check, and etc. Should I use perl (fairly new at) or ksh? Example code would be appreciated.

My date command for ksh doesn't support -d option, and no real way to install the GNU date command.

Replies are listed 'Best First'.
Re: processing dates as directories
by kcott (Archbishop) on Aug 10, 2013 at 05:27 UTC

    G'day moesplace,

    Welcome to the monastery.

    The basic tools you need for this are the readdir & reverse functions and the .. operator used as a flip-flop. Here's some skeleton code to show the technique:

    #!/usr/bin/env perl -l use strict; use warnings; my $base_dir = './pm_proc_dir_dates_rev_dir'; my $start_date = '20110228'; my $end_date = '20110302'; opendir my $dh, $base_dir or die "Can't open '$base_dir': $!"; for (reverse grep { /^[^.]/ } readdir $dh) { if (/$end_date$/ .. /$start_date$/) { print "+ Processing: $base_dir/$_"; } else { print "- Skipping: $base_dir/$_"; } } closedir $dh;

    Output:

    $ pm_proc_dir_dates_rev.pl - Skipping: ./pm_proc_dir_dates_rev_dir/20110303 + Processing: ./pm_proc_dir_dates_rev_dir/20110302 + Processing: ./pm_proc_dir_dates_rev_dir/20110301 + Processing: ./pm_proc_dir_dates_rev_dir/20110228 - Skipping: ./pm_proc_dir_dates_rev_dir/20110227

    Here's a listing of the ./pm_proc_dir_dates_rev_dir directory:

    $ ls -1a ./pm_proc_dir_dates_rev_dir . .. 20110227 20110228 20110301 20110302 20110303

    -- Ken

Re: processing dates as directories
by farang (Chaplain) on Aug 10, 2013 at 02:15 UTC

    I was playing around with this earlier so I may as well post what I did. It's a potential beginning, with a minimal amount of error checking. It takes just one argument for a start_date which must be eight digits in length, and operates on all files starting with '20' plus six more digits where the date/filename is at or larger than the start_date. It doesn't do a sanity check on how many files will be operated on, though. It can be tested as is to see a list of which files would be affected depending on the command line argument given.

    use strict; use warnings; my $start_date = $ARGV[0] || 'bogus'; die "bad argument" unless $start_date =~ /\d{8}/; my $archive_dir = '/dir/archive/'; my @files = qx('ls' $archive_dir); chomp @files; @files = sort {$b <=> $a} grep /^20\d{6}$/, @files; for my $file( @files ){ last if $file < $start_date; my $fullpath = $archive_dir . $file; print "$fullpath\n"; # for testing # Issue shell cmds here, e.g. # qx( 'cp' $fullpath $fullpath'.bkup'); # to back up files before proceeding. }

    The sort line puts the @files array in descending numerical order, so when the for loop runs it'll pick them up from newest to oldest. Obviously you'd want to test whatever shell processing is being done before letting it do anything potentially destructive to existing data, so making backup copies is probably prudent.

    Update: The easiest way I can think of right now, if a little repetitious, to include an end_date would be to start the program with the following.

    my $start_date = $ARGV[0] || 'bogus'; die "bad argument" unless $start_date =~ /\d{8}/; my $end_date = $ARGV[1] || 'bogus'; die "bad argument" unless $end_date =~ /\d{8}/; ...
    And then add a next condition in the for loop to skip processing for files beyond the end_date.
    next if $file > $end_date; last if $file < $start_date; ...

Re: processing dates as directories
by McA (Priest) on Aug 09, 2013 at 20:59 UTC
    for dir in `ls -dr /dir/archive/201*` do where=`pwd` cd "$dir" echo "your commands" cd "$where" done

    Not perl, but a solution, isn't it?

    Best regards
    McA

      Yes, that would be a solution... except for the massive amount of directories. The user that asked for a solution wanted to pass in a start and stop point to the script and I'm having a difficult time wrapping my head around a way to start at a specific location(date), stop at a specific location(date) and process through the directories backwards. Thoughts?

        Hey, your solution gave me a thought. Passing in the date is just a number... could I do a numerical < end-date and > start-date comparison inside the loop that you have?

        How would that look exactly?

        Good morning moesplace,

        as I have to admit that I just overread a part of your requirement I got the feeling that I'm in dept showing a building block for another way of solving your issue. Some other Monks gave different advices and hints. Mine here is a little different as it uses DateTime and DateTime::Duration to explicitly loop over every day in between your start and end date. So, probably another way to look at your problem. And definitely a Perlish solution showing regexes, using top rated modules, operator overloading:

        #!/usr/bin/perl use strict; use warnings; use DateTime; use 5.010; die "ERROR: You have to provide start- and end-date in format 'YYYYMMD +D'" if @ARGV != 2; my $start_date = $ARGV[0]; my $end_date = $ARGV[1]; my $start_year = 1900; my $start_month = 01; my $start_day = 01; my $end_year = 2100; my $end_month = 01; my $end_day = 01; if($start_date =~ m/^([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])$/) + { $start_year = $1; $start_month = $2; $start_day = $3; } else { die "ERROR: Start Date doesn't match YYYYMMDD"; } if($end_date =~ m/^([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])$/) { $end_year = $1; $end_month = $2; $end_day = $3; } else { die "ERROR: End Date doesn't match YYYYMMDD"; } my $start = DateTime->new( year => $start_year, month => $start_month, day => $start_day, ); my $end = DateTime->new( year => $end_year, month => $end_month, day => $end_day, ); if($end < $start) { die "ERROR: end date is less than start date."; } my $one_day = DateTime::Duration->new(days => 1); for(my $i = $end; $i >= $start; $i -= $one_day) { my $output = sprintf('%04d%02d%02d', $i->year, $i->month, $i->day, ); say $output; # Check if path exists # chdir to it # do what you want # chdir back }

        Best regards
        McA

        UPDATE: The driver in this solution are the generated dates and not the files found which than get compared to the start- and end-date.

Re: processing dates as directories
by Anonymous Monk on Aug 10, 2013 at 06:55 UTC

    Maybe this could also help.
    I would rather take the start and end from the CLI and check if they are correct.
    See if this can help:

    #!/usr/bin/perl -w use strict; use Getopt::Long; GetOptions( 'directory=s' => \my $directory, 'start_date=s' => \my $start, 'end_date=s' => \my $end, ); die 'Unknown directory path' unless defined $directory; opendir DIRH, $directory or die "can't open directory :$!"; my @files = grep { -d && $_ ne '.' && $_ ne '..' } readdir(DIRH); closedir DIRH or die $!; @files = file_range( files => \@files, begin => $start, end => $end, ); for (@files) { # do whatsoever you want in here # to each of the directory, you might have to # use subroutrine chdir, then opendir # over to you print $_, $/; } sub file_range { my %data = @_; my $file_size = 0; # check if the dates each has date in full i.e 4 digit 20110910 $file_size += split //, $_ for $data{begin}, $data{end}; #== 16 +; die 'One or both of your start or end date is not correct ' unless $file_size == 16; my @files_to_return; for my $file ( sort { $b <=> $a } @{ $data{files} } ) { push @files_to_return => grep /$file/, $data{begin} .. $data{e +nd}; } return @files_to_return; }
    To run so: script.pl -directory . -start 20110219 -end 20120303 Where '.' is your directory, you can as well change that.