Re: file modifications using file::find

Welcome to the Monastery.

I created some directories and populated them with files having varying contents and permissions:

$ for i in a b c; do cd $i; echo "DIR: `pwd`"; ls -l; cd ..; done
DIR: /home/ken/tmp/pm_11135362/a
total 1
-rw-r--r-- 1 ken None  0 Jul 25 16:44 empty
---------- 1 ken None 18 Jul 25 16:45 no_access
DIR: /home/ken/tmp/pm_11135362/b
total 1
-r--r--r-- 1 ken None 20 Jul 25 16:49 read_only
DIR: /home/ken/tmp/pm_11135362/c
total 1
-rw-r--r-- 1 ken None 7 Jul 25 16:51 read_write
[download]

I then wrote the following script which does various checks. The else blocks (with "OK to READ/WRITE") are where you'd call your reading/writing routines.

#!/usr/bin/env perl

use strict;
use warnings;

use Cwd;
use File::Find;

my $cwd = getcwd();
my @dirs = map "$cwd/$_", qw{a b c};

print "--- READING ---\n";
find(\&wanted_to_read, @dirs);

print "--- WRITING ---\n";
find(\&wanted_to_write, @dirs);

sub wanted_to_read {
    if (! -f $File::Find::name) {
        print "$File::Find::name is not a normal file.\n";
    }
    elsif (-z _) {
        print "$File::Find::name is zero-length.\n";
    }
    elsif (! -r _) {
        print "$File::Find::name is not readable.\n";
    }
    else {
        print "OK to READ: $File::Find::name\n";
    }

    return;
}

sub wanted_to_write {
    if (! -f $File::Find::name) {
        print "$File::Find::name is not a normal file.\n";
    }
    elsif (! -r _) {
        print "$File::Find::name is not readable.\n";
    }
    elsif (! -w _) {
        print "$File::Find::name is not writable.\n";
    }
    else {
        print "OK to WRITE: $File::Find::name\n";
    }

    return;
}
[download]

A sample run outputs:

ken@titan ~/tmp/pm_11135362
$ ./pm_11135362_file_find_example.pl
--- READING ---
/home/ken/tmp/pm_11135362/a is not a normal file.
/home/ken/tmp/pm_11135362/a/empty is zero-length.
/home/ken/tmp/pm_11135362/a/no_access is not readable.
/home/ken/tmp/pm_11135362/b is not a normal file.
OK to READ: /home/ken/tmp/pm_11135362/b/read_only
/home/ken/tmp/pm_11135362/c is not a normal file.
OK to READ: /home/ken/tmp/pm_11135362/c/read_write
--- WRITING ---
/home/ken/tmp/pm_11135362/a is not a normal file.
OK to WRITE: /home/ken/tmp/pm_11135362/a/empty
/home/ken/tmp/pm_11135362/a/no_access is not readable.
/home/ken/tmp/pm_11135362/b is not a normal file.
/home/ken/tmp/pm_11135362/b/read_only is not writable.
/home/ken/tmp/pm_11135362/c is not a normal file.
OK to WRITE: /home/ken/tmp/pm_11135362/c/read_write
[download]

I don't know what your reading/writing requirements are. See the open function, in the first instance, if you're unsure about that. Feel free to ask further questions about that if need be.

I also don't know what you mean by "IF check". It's not mentioned in the File::Find documentation. By itself, "IF" has a number of potentially valid interpretations in the context of your question (for instance, in "What does IF stand for?"); and you give no indication of what you intend to check. I've used a number of "file tests" which is possibly the sort of thing you want. [See "How do I post a question effectively?" for information on how you can help us to help you.]

Be very careful with specifying directories when using File::Find. I used Cwd for my example code, but that would have various problems in a production environment. The FindBin module may be useful if your target directories are always located relative to your script. Better options are to get the directories from a known source; e.g. a database, config file, or the like.

— Ken

Comment on Re: file modifications using file::find Select or Download Code

Replies are listed 'Best First'.
Re^2: file modifications using file::find by Marshall (Canon) on Jul 25, 2021 at 22:18 UTC
Ken, I like your code! Just a few comments: You snuck in the tests like "-z _". This is completely correct usage. As extra explanation for the OP, a file test operation is actually a fairly "expensive" file system operation. See: File:stat. When doing multiple tests on the same file, for the first, test the file name. This causes a structure with all kinds of stuff to be returned from the file system. For the 2nd, 3rd, etc. tests, use "_" instead of the file name and this enables Perl to return cached info based upon the last big stat request from the file system - meaning that these subsequent tests go a lot faster. About cwd.. It is not clear what the OP intends to do on files that pass "whatever the file test(s) are". I strive to do minimal processing within the File::find wanted routine. The reason is that File::find will cwd down the directory structure as it goes about its business. If it "blows up" because maybe some complicated "process a .pdf file" routine blows up which got called from within File::find, you will be left in some random place in the file structure far removed from whatever directory the script started in. That can complicate recovery error procedures. So usually I just generate a "to-do" list within the file find procedure and then do the actual complicated work once all the files have been found. Now of course there are a lot of "yeah, buts" to that general approach. Mileage certainly does vary! I am just saying that in my experience, keeping the "wanted routine" simple is a good idea. Update: The OP wrote: After much searching and reading, the articles I have have found regarding File::Find do nothing other than list file names, though they begin by saying things such as "do something with file". In general, I would make an array, `my @found;` Have the "wanted" routine push applicable $File:find:name onto that array and then process those files once File::find has finished its job. Keeping the "wanted" routine simple and restricting its job to just "finding files" to operate upon can save a lot of grief.	[reply] [d/l]
Re^3: file modifications using file::find by kcott (Archbishop) on Jul 26, 2021 at 00:22 UTC
G'day Marshall, "Ken, I like your code!" Thanks. I appreciate the compliment. 'You snuck in the tests like "-z _".' Unfortunately, the OP gave very little information about Perl experience or, indeed, `File::Find` usage requirements. As this was a first post, I didn't make a big deal about it; although, I did provide a link "for information on how you can help us to help you". My main aim was to provide the requested "help or a tutorial". I did consider including an explanation for the special filehandle, '`_`'; however, as I didn't know if these tests would be used, I chose to add a link to that information. I did use the link text "file tests", in case "-X" was too cryptic. :-) "About cwd" The last paragraph of my post did include a warning about specifying directories; an explanation that I'd only used `Cwd` for my example code; and, suggested a number of better alternatives. Furthermore, I only used the `getcwd()` function to generate the required `@directories_to_search` for `find()`; there was no explicit changing of directories in my script. Of course, there is the implicit changing of directories as part of `File::Find`'s default behaviour. You can change that: see "File::Find - %options - no_chdir". Your comments regarding using an array are sound. You may actually want to use the list more than once. I assume you know how to do this, but for the OP or anybody else, here's a very basic (partial) code example: `... find(\&wanted, @dirs); do_something_with(get_found_files()); check_something_done_with(get_found_files()); ... { my @found_files; sub get_found_files { return @found_files } sub wanted { ... push @found_files, $File::Find::name; ... } }` [download] Note that the anonymous block makes `@found_files` (lexically) private: only `get_found_files()` and `wanted()` have access to it. You can, of course, modify the list returned, but the original will stay intact. For anyone unfamiliar with this concept, see "perlsub: Private Variables via my()" for a more in-depth discussion of this topic. — Ken	[reply] [d/l] [select]
Re^4: file modifications using file::find by Marshall (Canon) on Jul 26, 2021 at 01:21 UTC
Hi kcott! Yes, my reply to your post was meant to be a compliment. I have no idea of what the OP is trying to accomplish. There is a huge amount that is left unsaid. As a very simple example, I present this code (create found, populate found, do something with found): #!/usr/bin/perl use strict; use warnings; use File::Find; # lists all "normal files" which are both readable and # writeable underneath "C:/test" my @found; find( { wanted => \&get_files }, "C:/test" ); print "$_\n" for @found; sub get_files { return unless ( -f $File::Find::name and -r _ and -w _); push @found, $File::Find::name; } __END__ printed on my machine: Note that the sub directory of "subdirtest" is skipped. A directory name fails the -f test. C:/test/maur-1110.tiff C:/test/maur-1111.psd C:/test/TUMI-1354839054_alt1_.psd C:/test/subdirtest/docinbsubdir.txt C:/test/subdirtest/New Text Document.txt [download]	[reply] [d/l]
Re^3: file modifications using file::find by eyepopslikeamosquito (Archbishop) on Jul 26, 2021 at 01:10 UTC
> I strive to do minimal processing within the File::find wanted routine So do I! I find I grow fewer grey hairs that way. ;) To illustrate, here's a simple example of using File::Find to find all `.txt` files under the current working directory. use strict; use warnings; use Cwd; use File::Find; # Return a list of the absolute path of all plain .txt files under $di +r sub FindTextFiles { my $dir = shift; my @files; # Note: -f = plain file (perldoc -f -X for doco of all file tests) find( { no_chdir => 1, wanted => sub { -f && /\.txt$/ and push @files, $File::Fi +nd::name } }, $dir ); return @files; } my $dir = getcwd(); my @txtfiles = FindTextFiles($dir); print "Found ", scalar(@txtfiles), " text files under dir '$dir'...\n" +; for my $file (@txtfiles) { print "file='$file'\n"; # could add code to modify the found files here ... } [download] Example output of running this program: `Found 5 text files under dir 'C:/pm/file-find'... file='C:/pm/file-find/example.txt' file='C:/pm/file-find/fred/f1/zz.txt' file='C:/pm/file-find/fred/f2/hello.txt' file='C:/pm/file-find/fred/f2/f2a/2a.txt' file='C:/pm/file-find/fred/f2/f2a/hello.txt'` [download] Once I've built the list of files, I sometimes set about changing them in-place -- which is surprisingly tricky to do robustly, as described at Re-runnably editing a file in place (see also CPAN File::Replace by haukex, which nicely solves this problem). A practice exercise for the OP: extend the test program above to change all occurrences of `Peking` to `Beijing` in the .txt files (which I sometimes torture job applicants with :).	[reply] [d/l] [select]