xc63 has asked for the wisdom of the Perl Monks concerning the following question:

Basically i'm working in a directory and I have about 8 subdirectories worth of files that I need to analyze, not a large fileset but i'm basically looking to stat them all. I think this code should analyze the last access time, modify time and last create time. Based on timestamps, if I wanted to also add start time, end time and how many batches were associated with each file, could someone show me how I would modify this?: #!/usr/bin/env perl
use strict; use warnings; use File::Find; use Time::localtime; use File::stat; for ( @ARGV ) { print "\nFile: $_"; print "\n Last access time: ", ctime( stat($_)->atime ); print "\n Last modify time: ", ctime( stat($_)->mtime ); print "\n File creation time: ", ctime( stat($_)->ctime ); }
The only thing i'm iffy on is how to pipe the files in the eight subdirectories into that code? Would it be easier to copy/save the files that I need to analyze to my home directory and call that file somewhere in my code? I'm still pretty new to perl so any help would be greatly appreciated.

Replies are listed 'Best First'.
Re: Analyzing Files Within a Directory
by Marshall (Canon) on Mar 15, 2017 at 00:24 UTC
    I am not sure about all of what you want to do. However, when dealing with multiple directories, the module File::Find is often the place to start. This will recurse down from a starting directory and call a subroutine for every file and directory underneath the starting directory. Some example code:
    #!/usr/bin/perl use strict; use warnings; use File::Find; $|=1; # turns buffering off on STDOUT so # error messages to unbufferred STDERR wind up # in the right time line order my @directories_to_search = ('.', 'another dir here'); find( \&process_each_file, @directories_to_search ); sub process_each_file { return unless -f $_; # only simple files (no dirs) print "$File::Find::name ctime: ", -C _, "\n"; print "$File::Find::name mtime: ", -M _, "\n"; print "$File::Find::name atime: ", -A _, "\n"; }
    File::Find will do a stat on the file for each file test operator function. stat() returns a big list of information which is cached. Above I used the special variable, underscore, "_" to access different parms of the cached information. This is not needed, but is much faster since only a local cache is being accessed rather than requiring another "expensive" file system operation. However, with only with 8 directories, I doubt this will matter at all in terms of performance.

    Modify @directories_to_search with either relative paths to where this script executes from or absolute paths as per your OS requirements.

    I have no idea about what you mean by "start" and "end" times? Can you clarify that?

    Update: I looked again this and your terminology of "last create time" set off some warning bells. There isn't any such thing. atime is last time contents where read or written. mtime is the last time that the file contents were modified. ctime is "change time", not create time. This is the most recent of mtime OR the last time file permissions were changed. Whenever anything about a file changes (except its access time), its ctime changes. A Windows, NTFS file system does have the idea of a creation or "born on" time. There are some wild quirks about that and I think we will go far astray talking about that now. I highly doubt that you actually mean "creation time". A special API is needed to get this "creation time" on Windows, -C $filename will not do it.

    Under almost all circumstances, the parameter of most interest is the mtime. The time that the contents of the file changed.

      i tend to use a construct like this

      sub find_temp{ my $dir=shift; my @txts; find( sub { return unless (-f $File::Find::name); push @txts,$File::Find::name;; } , $dir.'/tmp/'); return \@txts ; } # find temp
      these days, since i want to use the list later. in this case i might change
      push @txts,$File::Find::name;;
      to
      push @txts,[$File::Find::name, -C _, -M _, -A _];
      even if @txts is a bad name for it now. I knew the only thing in those dirs were .txt files and subdirs

      Thanks so much for that info, sorry it took me a little to update this thread. I think when it comes to start and end times, I'm looking for when the first are last read and/or written. I am also attempting to parse out the results by field, e.g., the batch ID/job ID. I'm trying to make modifications to figure out to do the aforementioned. Will File::Find make it easier to find each with also specifying each individual file's batch ID/job ID numbers?
        "last read and/or written." that sounds like you need atime, access time. You could have say a file that has not been changed for a year, but was accessed just a second ago for reading. That might be true of a "work horse" program that is often used, but seldom modified.

        For the second part, "parse out the results by field, e.g., the batch ID/job ID.". That sounds like you need some sort of regex to do file name matching? You might want to consider File::Find::Rule. In some more complicated scenarios, this can make the program logic easier to understand and implement. My requirements are usually straight-forward enough that I don't need it, but you should at least be aware of this option.

        Update: A few more comments:

        Will File::Find make it easier to find each with also specifying each individual file's batch ID/job ID numbers? File::Find solves the problem of writing the code of recursively descending through the the directory structure. This is well tested code that works. You can then focus on the job of deciding what to do with each file. In O/S file system lingo, a directory is actually just another type of a "file". The -f test will tell you whether a name is a simple plain file or not as opposed to a directory, or some kind of link. Not that the "directories of '.' and '..'" will occur, but are normally skipped.

        Also note that huck made some good suggestions, although I am not sure if your level of experience allows you to completely understand his code. There are some "above beginner" aspects to it. Nothing derogatory is intended.

        I suggest you start with my code as a prototype and see how you get on with that. By all means ask if you have questions.

Re: Analyzing Files Within a Directory
by stevieb (Canon) on Mar 14, 2017 at 21:50 UTC

    Welcome to the Monastery, xc63!

    Can you please elaborate a bit on what your criteria is?

    • Is it a single directory structure?
    • Do you need to recurse into *all* sub directories?
    • Are there constraints on file types to work on (ie. extension etc)?
    • Do you *need* the information coming in from the command line?
      Single directory structure. I am only attempting to use this against several sub directories. There are no file type constraints. I would definitely prefer information coming in from the command line with this particular task.