Analyzing Files Within a Directory

xc63 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Analyzing Files Within a Directory by Marshall (Canon) on Mar 15, 2017 at 00:24 UTC
I am not sure about all of what you want to do. However, when dealing with multiple directories, the module File::Find is often the place to start. This will recurse down from a starting directory and call a subroutine for every file and directory underneath the starting directory. Some example code: #!/usr/bin/perl use strict; use warnings; use File::Find; $\|=1; # turns buffering off on STDOUT so # error messages to unbufferred STDERR wind up # in the right time line order my @directories_to_search = ('.', 'another dir here'); find( \&process_each_file, @directories_to_search ); sub process_each_file { return unless -f $_; # only simple files (no dirs) print "$File::Find::name ctime: ", -C _, "\n"; print "$File::Find::name mtime: ", -M _, "\n"; print "$File::Find::name atime: ", -A _, "\n"; } [download] File::Find will do a stat on the file for each file test operator function. stat() returns a big list of information which is cached. Above I used the special variable, underscore, "_" to access different parms of the cached information. This is not needed, but is much faster since only a local cache is being accessed rather than requiring another "expensive" file system operation. However, with only with 8 directories, I doubt this will matter at all in terms of performance. Modify `@directories_to_search` with either relative paths to where this script executes from or absolute paths as per your OS requirements. I have no idea about what you mean by "start" and "end" times? Can you clarify that? Update: I looked again this and your terminology of "last create time" set off some warning bells. There isn't any such thing. atime is last time contents where read or written. mtime is the last time that the file contents were modified. ctime is "change time", not create time. This is the most recent of mtime OR the last time file permissions were changed. Whenever anything about a file changes (except its access time), its ctime changes. A Windows, NTFS file system does have the idea of a creation or "born on" time. There are some wild quirks about that and I think we will go far astray talking about that now. I highly doubt that you actually mean "creation time". A special API is needed to get this "creation time" on Windows, -C $filename will not do it. Under almost all circumstances, the parameter of most interest is the mtime. The time that the contents of the file changed.	[reply] [d/l] [select]
Re^2: Analyzing Files Within a Directory by huck (Prior) on Mar 15, 2017 at 01:58 UTC
i tend to use a construct like this `sub find_temp{ my $dir=shift; my @txts; find( sub { return unless (-f $File::Find::name); push @txts,$File::Find::name;; } , $dir.'/tmp/'); return \@txts ; } # find temp` [download] these days, since i want to use the list later. in this case i might change `push @txts,$File::Find::name;;` [download] to `push @txts,[$File::Find::name, -C _, -M _, -A _];` [download] even if @txts is a bad name for it now. I knew the only thing in those dirs were .txt files and subdirs	[reply] [d/l] [select]
Re^2: Analyzing Files Within a Directory by xc63 (Initiate) on Mar 19, 2017 at 14:52 UTC
Thanks so much for that info, sorry it took me a little to update this thread. I think when it comes to start and end times, I'm looking for when the first are last read and/or written. I am also attempting to parse out the results by field, e.g., the batch ID/job ID. I'm trying to make modifications to figure out to do the aforementioned. Will File::Find make it easier to find each with also specifying each individual file's batch ID/job ID numbers?	[reply]
Re^3: Analyzing Files Within a Directory by Marshall (Canon) on Mar 19, 2017 at 20:08 UTC
"last read and/or written." that sounds like you need atime, access time. You could have say a file that has not been changed for a year, but was accessed just a second ago for reading. That might be true of a "work horse" program that is often used, but seldom modified. For the second part, "parse out the results by field, e.g., the batch ID/job ID.". That sounds like you need some sort of regex to do file name matching? You might want to consider File::Find::Rule. In some more complicated scenarios, this can make the program logic easier to understand and implement. My requirements are usually straight-forward enough that I don't need it, but you should at least be aware of this option. Update: A few more comments: Will File::Find make it easier to find each with also specifying each individual file's batch ID/job ID numbers? File::Find solves the problem of writing the code of recursively descending through the the directory structure. This is well tested code that works. You can then focus on the job of deciding what to do with each file. In O/S file system lingo, a directory is actually just another type of a "file". The -f test will tell you whether a name is a simple plain file or not as opposed to a directory, or some kind of link. Not that the "directories of '.' and '..'" will occur, but are normally skipped. Also note that huck made some good suggestions, although I am not sure if your level of experience allows you to completely understand his code. There are some "above beginner" aspects to it. Nothing derogatory is intended. I suggest you start with my code as a prototype and see how you get on with that. By all means ask if you have questions.	[reply]
Re^4: Analyzing Files Within a Directory by afoken (Chancellor) on Mar 21, 2017 at 05:08 UTC
Re^5: Analyzing Files Within a Directory by afoken (Chancellor) on Mar 24, 2017 at 07:28 UTC
Re: Analyzing Files Within a Directory by stevieb (Canon) on Mar 14, 2017 at 21:50 UTC
Welcome to the Monastery, xc63! Can you please elaborate a bit on what your criteria is? Is it a single directory structure? Do you need to recurse into all sub directories? Are there constraints on file types to work on (ie. extension etc)? Do you need the information coming in from the command line?	[reply]
Re^2: Analyzing Files Within a Directory by xc63 (Initiate) on Mar 19, 2017 at 14:55 UTC
Single directory structure. I am only attempting to use this against several sub directories. There are no file type constraints. I would definitely prefer information coming in from the command line with this particular task.	[reply]