Finding files recursively

ovedpo15 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Finding files recursively by holli (Abbot) on Aug 04, 2019 at 18:09 UTC
You are fighting the module. And you are doing a lot of unneccessary work. Consider `use File::Find; my @found; my $path = 'd:\env\videos'; my $target = '2012.avi'; find ( sub { # We're only interested in directories return unless -d $_; # Bail if there is an .ignore here return if -e "$_/.ignore"; # Add to the results if the target is found here push @found, $File::Find::name if -e "$_/$target"; }, $path); print "@found";` [download] `D:\ENV>perl pm10.pl d:\env\videos/2012 D:\ENV>echo.>d:\env\videos\2012\.ignore D:\ENV>perl pm10.pl D:\ENV>` [download] holli You can lead your users to water, but alas, you cannot drown them.	[reply] [d/l] [select]
Re^2: Finding files recursively by ovedpo15 (Pilgrim) on Aug 04, 2019 at 19:58 UTC
Thanks for your suggestion but I don't understand the difference between both suggestions. Also, what is $target?. Thank you again.	[reply]
Re^3: Finding files recursively by holli (Abbot) on Aug 04, 2019 at 20:20 UTC
$target is just the filename you are looking for, "secret.file" in your case. The difference is that my code is exiting the `wanted` function immedeatly when it is not dealing with a directory. Only if there is a directory it is looking wether the target file is in that directory. Whereas your code looks at each and every file, calculates its' base path (albeit unneccessary, that info is already there in `$File::Find::name`). And then it takes that base directory to look for the target file. This, and this is the biggest slowdown, also means that you are testing the same directory number-of-entries-in-the-directory times. holli You can lead your users to water, but alas, you cannot drown them.	[reply] [d/l] [select]
Re^4: Finding files recursively by ovedpo15 (Pilgrim) on Aug 05, 2019 at 06:40 UTC
Re^5: Finding files recursively by holli (Abbot) on Aug 05, 2019 at 08:16 UTC
Some notes below your chosen depth have not been shown here
Re: Finding files recursively by dsheroh (Monsignor) on Aug 05, 2019 at 08:04 UTC
The problem is that finding over a large directory could take hours. How large are we talking? Does it take hours to run `ls -RU` over that directory? If so, then there's nothing you can do in Perl to do it faster because that's how long it takes for the disk to retrieve the directory entries. A quick test on my laptop suggests that 1 hour may correspond to about a million directory entries on this machine, but your hardware may vary. Wildly. Also, if you're on a nix box, I'd be willing to bet that the OS's `find` binary is pretty well optimized. Generating a list of candidate directories with `find $STARTING_DIR -name secret.file`, then using Perl to run down that list and remove any with a `.ignore` file would probably be a pretty effective way to do this, albeit less effective as an exercise in using/learning more Perl, if that's your primary objective. There may even be a way to get `find` to filter out the directories with `.ignore` files in the first pass, so that you don't have to go back a second time to look for them, but my `find`-fu isn't up to that task. Even if you're going to ultimately write a Perl solution regardless, generating a list of all the `secret.file`s with `find` is going to be a good sanity check to estimate the absolute fastest possible time the task could be done in. My first idea was to split the directories into the process so they will perform a parallel search but I'm not sure if that a good idea.* If your bottleneck is on disk I/O rather than on processing, then parallelization won't help (if it's already waiting on the disk, having more CPU cores waiting isn't going to make the disk any faster) and may make things significantly worse (by making the disk spend more time jumping from one directory to another, and less time actually reading the data you want).	[reply] [d/l] [select]
Re^2: Finding files recursively by ovedpo15 (Pilgrim) on Aug 05, 2019 at 08:52 UTC
Thanks for the reply! By parallel process, I meant to use fork(). Consider a directory with multiple subdirectories. I will use a fork and find all the valid directories for each one and then merge the arrays. Is it a bad idea?	[reply]
Re^3: Finding files recursively by marto (Cardinal) on Aug 05, 2019 at 09:17 UTC
"Is it a bad idea?" Now is the perfect time to learn about I/O_scheduling and Scheduling disciplines.	[reply]
Re: Finding files recursively by bliako (Abbot) on Aug 05, 2019 at 02:11 UTC
Those who cannot remember the past are condemned to repeat it. so do cache if the OS does not do this for you already and while you wait for your cache to build, this is worth reading (I found) ... Before reaching the final line, however, he had already understood that he would never leave that room, for it was foreseen that the city of mirrors (or mirages) would be wiped out by the wind and exiled from the memory of men at the precise moment when Aureliano Babilonia would finish deciphering the parchments, and that everything written on them was unrepeatable since time immemorial and forever more, because races condemned to one hundred years of solitude did not have a second opportunity on earth. bottomline: do cache but do not cache too much lest all be wiped out. bw, bliako	[reply]
Re: Finding files recursively by tybalt89 (Monsignor) on Aug 06, 2019 at 14:57 UTC
If you are on a *nix box with `locate`, that might be faster.	[reply] [d/l]
Re^2: Finding files recursively by afoken (Chancellor) on Aug 06, 2019 at 20:23 UTC
If you are on a *nix box with locate, that might be faster. But only if you search for files that existed the last time updatedb has been run. locate simply queries the database generated by updatedb. Depending on your system, updatedb runs from cron, or it has to be run manually. locate can't find files that did not exist while updatedb has run. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re^2: Finding files recursively by Fletch (Bishop) on Aug 09, 2019 at 13:37 UTC
Similarly `mdfind -name foo` for OS X (with the advantage that OS X's filesystem metadata DB is updated all but continuously). The cake is a lie. The cake is a lie. The cake is a lie.	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.