file modifications using file::find

propellerhat has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: file modifications using file::find by kcott (Archbishop) on Jul 25, 2021 at 10:06 UTC
G'day propellerhat, Welcome to the Monastery. I created some directories and populated them with files having varying contents and permissions: $ for i in a b c; do cd $i; echo "DIR: `pwd`"; ls -l; cd ..; done DIR: /home/ken/tmp/pm_11135362/a total 1 -rw-r--r-- 1 ken None 0 Jul 25 16:44 empty ---------- 1 ken None 18 Jul 25 16:45 no_access DIR: /home/ken/tmp/pm_11135362/b total 1 -r--r--r-- 1 ken None 20 Jul 25 16:49 read_only DIR: /home/ken/tmp/pm_11135362/c total 1 -rw-r--r-- 1 ken None 7 Jul 25 16:51 read_write [download] I then wrote the following script which does various checks. The `else` blocks (with "OK to READ/WRITE") are where you'd call your reading/writing routines. #!/usr/bin/env perl use strict; use warnings; use Cwd; use File::Find; my $cwd = getcwd(); my @dirs = map "$cwd/$_", qw{a b c}; print "--- READING ---\n"; find(\&wanted_to_read, @dirs); print "--- WRITING ---\n"; find(\&wanted_to_write, @dirs); sub wanted_to_read { if (! -f $File::Find::name) { print "$File::Find::name is not a normal file.\n"; } elsif (-z _) { print "$File::Find::name is zero-length.\n"; } elsif (! -r _) { print "$File::Find::name is not readable.\n"; } else { print "OK to READ: $File::Find::name\n"; } return; } sub wanted_to_write { if (! -f $File::Find::name) { print "$File::Find::name is not a normal file.\n"; } elsif (! -r _) { print "$File::Find::name is not readable.\n"; } elsif (! -w _) { print "$File::Find::name is not writable.\n"; } else { print "OK to WRITE: $File::Find::name\n"; } return; } [download] A sample run outputs: ken@titan ~/tmp/pm_11135362 $ ./pm_11135362_file_find_example.pl --- READING --- /home/ken/tmp/pm_11135362/a is not a normal file. /home/ken/tmp/pm_11135362/a/empty is zero-length. /home/ken/tmp/pm_11135362/a/no_access is not readable. /home/ken/tmp/pm_11135362/b is not a normal file. OK to READ: /home/ken/tmp/pm_11135362/b/read_only /home/ken/tmp/pm_11135362/c is not a normal file. OK to READ: /home/ken/tmp/pm_11135362/c/read_write --- WRITING --- /home/ken/tmp/pm_11135362/a is not a normal file. OK to WRITE: /home/ken/tmp/pm_11135362/a/empty /home/ken/tmp/pm_11135362/a/no_access is not readable. /home/ken/tmp/pm_11135362/b is not a normal file. /home/ken/tmp/pm_11135362/b/read_only is not writable. /home/ken/tmp/pm_11135362/c is not a normal file. OK to WRITE: /home/ken/tmp/pm_11135362/c/read_write [download] I don't know what your reading/writing requirements are. See the open function, in the first instance, if you're unsure about that. Feel free to ask further questions about that if need be. I also don't know what you mean by "IF check". It's not mentioned in the File::Find documentation. By itself, "IF" has a number of potentially valid interpretations in the context of your question (for instance, in "What does IF stand for?"); and you give no indication of what you intend to check. I've used a number of "file tests" which is possibly the sort of thing you want. [See "How do I post a question effectively?" for information on how you can help us to help you.] Be very careful with specifying directories when using File::Find. I used Cwd for my example code, but that would have various problems in a production environment. The FindBin module may be useful if your target directories are always located relative to your script. Better options are to get the directories from a known source; e.g. a database, config file, or the like. — Ken	[reply] [d/l] [select]
Re^2: file modifications using file::find by Marshall (Canon) on Jul 25, 2021 at 22:18 UTC
Ken, I like your code! Just a few comments: You snuck in the tests like "-z _". This is completely correct usage. As extra explanation for the OP, a file test operation is actually a fairly "expensive" file system operation. See: File:stat. When doing multiple tests on the same file, for the first, test the file name. This causes a structure with all kinds of stuff to be returned from the file system. For the 2nd, 3rd, etc. tests, use "_" instead of the file name and this enables Perl to return cached info based upon the last big stat request from the file system - meaning that these subsequent tests go a lot faster. About cwd.. It is not clear what the OP intends to do on files that pass "whatever the file test(s) are". I strive to do minimal processing within the File::find wanted routine. The reason is that File::find will cwd down the directory structure as it goes about its business. If it "blows up" because maybe some complicated "process a .pdf file" routine blows up which got called from within File::find, you will be left in some random place in the file structure far removed from whatever directory the script started in. That can complicate recovery error procedures. So usually I just generate a "to-do" list within the file find procedure and then do the actual complicated work once all the files have been found. Now of course there are a lot of "yeah, buts" to that general approach. Mileage certainly does vary! I am just saying that in my experience, keeping the "wanted routine" simple is a good idea. Update: The OP wrote: After much searching and reading, the articles I have have found regarding File::Find do nothing other than list file names, though they begin by saying things such as "do something with file". In general, I would make an array, `my @found;` Have the "wanted" routine push applicable $File:find:name onto that array and then process those files once File::find has finished its job. Keeping the "wanted" routine simple and restricting its job to just "finding files" to operate upon can save a lot of grief.	[reply] [d/l]
Re^3: file modifications using file::find by kcott (Archbishop) on Jul 26, 2021 at 00:22 UTC
G'day Marshall, "Ken, I like your code!" Thanks. I appreciate the compliment. 'You snuck in the tests like "-z _".' Unfortunately, the OP gave very little information about Perl experience or, indeed, `File::Find` usage requirements. As this was a first post, I didn't make a big deal about it; although, I did provide a link "for information on how you can help us to help you". My main aim was to provide the requested "help or a tutorial". I did consider including an explanation for the special filehandle, '`_`'; however, as I didn't know if these tests would be used, I chose to add a link to that information. I did use the link text "file tests", in case "-X" was too cryptic. :-) "About cwd" The last paragraph of my post did include a warning about specifying directories; an explanation that I'd only used `Cwd` for my example code; and, suggested a number of better alternatives. Furthermore, I only used the `getcwd()` function to generate the required `@directories_to_search` for `find()`; there was no explicit changing of directories in my script. Of course, there is the implicit changing of directories as part of `File::Find`'s default behaviour. You can change that: see "File::Find - %options - no_chdir". Your comments regarding using an array are sound. You may actually want to use the list more than once. I assume you know how to do this, but for the OP or anybody else, here's a very basic (partial) code example: `... find(\&wanted, @dirs); do_something_with(get_found_files()); check_something_done_with(get_found_files()); ... { my @found_files; sub get_found_files { return @found_files } sub wanted { ... push @found_files, $File::Find::name; ... } }` [download] Note that the anonymous block makes `@found_files` (lexically) private: only `get_found_files()` and `wanted()` have access to it. You can, of course, modify the list returned, but the original will stay intact. For anyone unfamiliar with this concept, see "perlsub: Private Variables via my()" for a more in-depth discussion of this topic. — Ken	[reply] [d/l] [select]
Re^4: file modifications using file::find by Marshall (Canon) on Jul 26, 2021 at 01:21 UTC
Re^3: file modifications using file::find by eyepopslikeamosquito (Archbishop) on Jul 26, 2021 at 01:10 UTC
> I strive to do minimal processing within the File::find wanted routine So do I! I find I grow fewer grey hairs that way. ;) To illustrate, here's a simple example of using File::Find to find all `.txt` files under the current working directory. use strict; use warnings; use Cwd; use File::Find; # Return a list of the absolute path of all plain .txt files under $di +r sub FindTextFiles { my $dir = shift; my @files; # Note: -f = plain file (perldoc -f -X for doco of all file tests) find( { no_chdir => 1, wanted => sub { -f && /\.txt$/ and push @files, $File::Fi +nd::name } }, $dir ); return @files; } my $dir = getcwd(); my @txtfiles = FindTextFiles($dir); print "Found ", scalar(@txtfiles), " text files under dir '$dir'...\n" +; for my $file (@txtfiles) { print "file='$file'\n"; # could add code to modify the found files here ... } [download] Example output of running this program: `Found 5 text files under dir 'C:/pm/file-find'... file='C:/pm/file-find/example.txt' file='C:/pm/file-find/fred/f1/zz.txt' file='C:/pm/file-find/fred/f2/hello.txt' file='C:/pm/file-find/fred/f2/f2a/2a.txt' file='C:/pm/file-find/fred/f2/f2a/hello.txt'` [download] Once I've built the list of files, I sometimes set about changing them in-place -- which is surprisingly tricky to do robustly, as described at Re-runnably editing a file in place (see also CPAN File::Replace by haukex, which nicely solves this problem). A practice exercise for the OP: extend the test program above to change all occurrences of `Peking` to `Beijing` in the .txt files (which I sometimes torture job applicants with :).	[reply] [d/l] [select]
Re: file modifications using file::find (File::Find References) by eyepopslikeamosquito (Archbishop) on Jul 25, 2021 at 12:06 UTC
Welcome to the monastery propellerhat! It would be great if you could provide us with a bit more context about yourself and your problem, including why you need to solve it: Are you an experienced Perl programmer or new to Perl? Which platform/s do you need your script to run on? Unix? (if so, which flavour?). Do you require that it runs on Windows too? That will allow us to provide you with more helpful answers on how to get the most out of Perl's excellent File::Find module. As indicated at Re^2: Unix shell versus Perl, Perl's File::Find module has many advantages compared to Unix shell and its `find` command. I also know from personal experience that File::Find is portable and works well under Windows too - though there are some pitfalls for the unwary, due to the underlying differences between Unix and Windows file systems. File::Find References Added Later File::Find (perldoc) File::Find::Rule by Richard Clamp (CPAN) App::find2perl and find2perl by Leon Timmermans (CPAN) - script to translate Unix find command lines to Perl code Re: file modifications using file::find by kcott (2021) - complete and clear sample code using File::Find Re^3: file modifications using file::find (2021) - sample Perl code to find all .txt files under current working directory Re^3: Strictly nested sub warnings (2010) - sample Perl code to find all symlinks under `$dir` using `File::Find` Re^2: Unix shell versus Perl (2008) - performance of Perl `File::Find` vs Unix `find` command Properly newline terminate a bunch of files on Unix (2003) - performance of Perl `File::Find` vs Unix `find/xargs` find2perl with File::Find and -maxdepth by u914 (2002) - seeking advice on File::Find and `find2perl` how to use find() options by ypreisler (2024) - struggling to follow symbolic links (davido shows how) find () command does not process all symlinks by ypreisler (2024) - ... still struggling Re: How to create symlink using Perl? (Unix symlinks vs Windows Junctions) See Also symlink (`perldoc -f symlink`) perlport (symlink) Re: Check for another program availability (Running External Processes on Unix and Windows References) Re^4: uparse - Parse Unicode strings (locate/find/xargs) Unix shell versus Perl Re-runnably editing a file in place Re: RFC: Self Assessment Perl (2018) - answers to sample interview questions File::Replace by haukex (CPAN)	[reply] [d/l] [select]
Re: file modifications using file::find by Anonymous Monk on Jul 25, 2021 at 03:32 UTC
After much searching and reading, I have not located the File::Find specification. ... did you go to http://perldoc.perl.org and type File::Find into the search box, or type `perldoc File::Find` at the command line?	[reply]
Re: file modifications using file::find by Anonymous Monk on Jul 25, 2021 at 02:14 UTC
perlintro, Path::Tiny, Modern Perl,File::Find::Rule, path tiny, find rule	[reply]