in reply to file modifications using file::find

G'day propellerhat,

Welcome to the Monastery.

I created some directories and populated them with files having varying contents and permissions:

$ for i in a b c; do cd $i; echo "DIR: `pwd`"; ls -l; cd ..; done DIR: /home/ken/tmp/pm_11135362/a total 1 -rw-r--r-- 1 ken None 0 Jul 25 16:44 empty ---------- 1 ken None 18 Jul 25 16:45 no_access DIR: /home/ken/tmp/pm_11135362/b total 1 -r--r--r-- 1 ken None 20 Jul 25 16:49 read_only DIR: /home/ken/tmp/pm_11135362/c total 1 -rw-r--r-- 1 ken None 7 Jul 25 16:51 read_write

I then wrote the following script which does various checks. The else blocks (with "OK to READ/WRITE") are where you'd call your reading/writing routines.

#!/usr/bin/env perl use strict; use warnings; use Cwd; use File::Find; my $cwd = getcwd(); my @dirs = map "$cwd/$_", qw{a b c}; print "--- READING ---\n"; find(\&wanted_to_read, @dirs); print "--- WRITING ---\n"; find(\&wanted_to_write, @dirs); sub wanted_to_read { if (! -f $File::Find::name) { print "$File::Find::name is not a normal file.\n"; } elsif (-z _) { print "$File::Find::name is zero-length.\n"; } elsif (! -r _) { print "$File::Find::name is not readable.\n"; } else { print "OK to READ: $File::Find::name\n"; } return; } sub wanted_to_write { if (! -f $File::Find::name) { print "$File::Find::name is not a normal file.\n"; } elsif (! -r _) { print "$File::Find::name is not readable.\n"; } elsif (! -w _) { print "$File::Find::name is not writable.\n"; } else { print "OK to WRITE: $File::Find::name\n"; } return; }

A sample run outputs:

ken@titan ~/tmp/pm_11135362 $ ./pm_11135362_file_find_example.pl --- READING --- /home/ken/tmp/pm_11135362/a is not a normal file. /home/ken/tmp/pm_11135362/a/empty is zero-length. /home/ken/tmp/pm_11135362/a/no_access is not readable. /home/ken/tmp/pm_11135362/b is not a normal file. OK to READ: /home/ken/tmp/pm_11135362/b/read_only /home/ken/tmp/pm_11135362/c is not a normal file. OK to READ: /home/ken/tmp/pm_11135362/c/read_write --- WRITING --- /home/ken/tmp/pm_11135362/a is not a normal file. OK to WRITE: /home/ken/tmp/pm_11135362/a/empty /home/ken/tmp/pm_11135362/a/no_access is not readable. /home/ken/tmp/pm_11135362/b is not a normal file. /home/ken/tmp/pm_11135362/b/read_only is not writable. /home/ken/tmp/pm_11135362/c is not a normal file. OK to WRITE: /home/ken/tmp/pm_11135362/c/read_write

I don't know what your reading/writing requirements are. See the open function, in the first instance, if you're unsure about that. Feel free to ask further questions about that if need be.

I also don't know what you mean by "IF check". It's not mentioned in the File::Find documentation. By itself, "IF" has a number of potentially valid interpretations in the context of your question (for instance, in "What does IF stand for?"); and you give no indication of what you intend to check. I've used a number of "file tests" which is possibly the sort of thing you want. [See "How do I post a question effectively?" for information on how you can help us to help you.]

Be very careful with specifying directories when using File::Find. I used Cwd for my example code, but that would have various problems in a production environment. The FindBin module may be useful if your target directories are always located relative to your script. Better options are to get the directories from a known source; e.g. a database, config file, or the like.

— Ken

Replies are listed 'Best First'.
Re^2: file modifications using file::find
by Marshall (Canon) on Jul 25, 2021 at 22:18 UTC
    Ken, I like your code! Just a few comments:

    You snuck in the tests like "-z _". This is completely correct usage. As extra explanation for the OP, a file test operation is actually a fairly "expensive" file system operation. See: File:stat. When doing multiple tests on the same file, for the first, test the file name. This causes a structure with all kinds of stuff to be returned from the file system. For the 2nd, 3rd, etc. tests, use "_" instead of the file name and this enables Perl to return cached info based upon the last big stat request from the file system - meaning that these subsequent tests go a lot faster.

    About cwd.. It is not clear what the OP intends to do on files that pass "whatever the file test(s) are". I strive to do minimal processing within the File::find wanted routine. The reason is that File::find will cwd down the directory structure as it goes about its business. If it "blows up" because maybe some complicated "process a .pdf file" routine blows up which got called from within File::find, you will be left in some random place in the file structure far removed from whatever directory the script started in. That can complicate recovery error procedures. So usually I just generate a "to-do" list within the file find procedure and then do the actual complicated work once all the files have been found. Now of course there are a lot of "yeah, buts" to that general approach. Mileage certainly does vary! I am just saying that in my experience, keeping the "wanted routine" simple is a good idea.

    Update: The OP wrote: After much searching and reading, the articles I have have found regarding File::Find do nothing other than list file names, though they begin by saying things such as "do something with file". In general, I would make an array, my @found; Have the "wanted" routine push applicable $File:find:name onto that array and then process those files once File::find has finished its job. Keeping the "wanted" routine simple and restricting its job to just "finding files" to operate upon can save a lot of grief.

      G'day Marshall,

      "Ken, I like your code!"

      Thanks. I appreciate the compliment.

      'You snuck in the tests like "-z _".'

      Unfortunately, the OP gave very little information about Perl experience or, indeed, File::Find usage requirements. As this was a first post, I didn't make a big deal about it; although, I did provide a link "for information on how you can help us to help you".

      My main aim was to provide the requested "help or a tutorial". I did consider including an explanation for the special filehandle, '_'; however, as I didn't know if these tests would be used, I chose to add a link to that information. I did use the link text "file tests", in case "-X" was too cryptic. :-)

      "About cwd"

      The last paragraph of my post did include a warning about specifying directories; an explanation that I'd only used Cwd for my example code; and, suggested a number of better alternatives. Furthermore, I only used the getcwd() function to generate the required @directories_to_search for find(); there was no explicit changing of directories in my script.

      Of course, there is the implicit changing of directories as part of File::Find's default behaviour. You can change that: see "File::Find - %options - no_chdir".

      Your comments regarding using an array are sound. You may actually want to use the list more than once. I assume you know how to do this, but for the OP or anybody else, here's a very basic (partial) code example:

      ... find(\&wanted, @dirs); do_something_with(get_found_files()); check_something_done_with(get_found_files()); ... { my @found_files; sub get_found_files { return @found_files } sub wanted { ... push @found_files, $File::Find::name; ... } }

      Note that the anonymous block makes @found_files (lexically) private: only get_found_files() and wanted() have access to it. You can, of course, modify the list returned, but the original will stay intact. For anyone unfamiliar with this concept, see "perlsub: Private Variables via my()" for a more in-depth discussion of this topic.

      — Ken

        Hi kcott!

        Yes, my reply to your post was meant to be a compliment.

        I have no idea of what the OP is trying to accomplish. There is a huge amount that is left unsaid.

        As a very simple example, I present this code (create found, populate found, do something with found):

        #!/usr/bin/perl use strict; use warnings; use File::Find; # lists all "normal files" which are both readable and # writeable underneath "C:/test" my @found; find( { wanted => \&get_files }, "C:/test" ); print "$_\n" for @found; sub get_files { return unless ( -f $File::Find::name and -r _ and -w _); push @found, $File::Find::name; } __END__ printed on my machine: Note that the sub directory of "subdirtest" is skipped. A directory name fails the -f test. C:/test/maur-1110.tiff C:/test/maur-1111.psd C:/test/TUMI-1354839054_alt1_.psd C:/test/subdirtest/docinbsubdir.txt C:/test/subdirtest/New Text Document.txt

      > I strive to do minimal processing within the File::find wanted routine
      So do I! I find I grow fewer grey hairs that way. ;) To illustrate, here's a simple example of using File::Find to find all .txt files under the current working directory.
      use strict; use warnings; use Cwd; use File::Find; # Return a list of the absolute path of all plain .txt files under $di +r sub FindTextFiles { my $dir = shift; my @files; # Note: -f = plain file (perldoc -f -X for doco of all file tests) find( { no_chdir => 1, wanted => sub { -f && /\.txt$/ and push @files, $File::Fi +nd::name } }, $dir ); return @files; } my $dir = getcwd(); my @txtfiles = FindTextFiles($dir); print "Found ", scalar(@txtfiles), " text files under dir '$dir'...\n" +; for my $file (@txtfiles) { print "file='$file'\n"; # could add code to modify the found files here ... }

      Example output of running this program:

      Found 5 text files under dir 'C:/pm/file-find'... file='C:/pm/file-find/example.txt' file='C:/pm/file-find/fred/f1/zz.txt' file='C:/pm/file-find/fred/f2/hello.txt' file='C:/pm/file-find/fred/f2/f2a/2a.txt' file='C:/pm/file-find/fred/f2/f2a/hello.txt'

      Once I've built the list of files, I sometimes set about changing them in-place -- which is surprisingly tricky to do robustly, as described at Re-runnably editing a file in place (see also CPAN File::Replace by haukex, which nicely solves this problem).

      A practice exercise for the OP: extend the test program above to change all occurrences of Peking to Beijing in the .txt files (which I sometimes torture job applicants with :).