Re: Finding files recursively
by holli (Abbot) on Aug 04, 2019 at 18:09 UTC
|
You are fighting the module. And you are doing a lot of unneccessary work. Consider
use File::Find;
my @found;
my $path = 'd:\env\videos';
my $target = '2012.avi';
find ( sub
{
# We're only interested in directories
return
unless -d $_;
# Bail if there is an .ignore here
return
if -e "$_/.ignore";
# Add to the results if the target is found here
push @found, $File::Find::name
if -e "$_/$target";
}, $path);
print "@found";
D:\ENV>perl pm10.pl
d:\env\videos/2012
D:\ENV>echo.>d:\env\videos\2012\.ignore
D:\ENV>perl pm10.pl
D:\ENV>
holli
You can lead your users to water, but alas, you cannot drown them.
| [reply] [d/l] [select] |
|
|
Thanks for your suggestion but I don't understand the difference between both suggestions. Also, what is $target?. Thank you again.
| [reply] |
|
|
| [reply] [d/l] [select] |
|
|
|
|
|
Re: Finding files recursively
by dsheroh (Monsignor) on Aug 05, 2019 at 08:04 UTC
|
The problem is that finding over a large directory could take hours.
How large are we talking? Does it take hours to run ls -RU over that directory? If so, then there's nothing you can do in Perl to do it faster because that's how long it takes for the disk to retrieve the directory entries. A quick test on my laptop suggests that 1 hour may correspond to about a million directory entries on this machine, but your hardware may vary. Wildly.
Also, if you're on a *nix box, I'd be willing to bet that the OS's find binary is pretty well optimized. Generating a list of candidate directories with find $STARTING_DIR -name secret.file, then using Perl to run down that list and remove any with a .ignore file would probably be a pretty effective way to do this, albeit less effective as an exercise in using/learning more Perl, if that's your primary objective. There may even be a way to get find to filter out the directories with .ignore files in the first pass, so that you don't have to go back a second time to look for them, but my find-fu isn't up to that task.
Even if you're going to ultimately write a Perl solution regardless, generating a list of all the secret.files with find is going to be a good sanity check to estimate the absolute fastest possible time the task could be done in.
My first idea was to split the directories into the process so they will perform a parallel search but I'm not sure if that a good idea.
If your bottleneck is on disk I/O rather than on processing, then parallelization won't help (if it's already waiting on the disk, having more CPU cores waiting isn't going to make the disk any faster) and may make things significantly worse (by making the disk spend more time jumping from one directory to another, and less time actually reading the data you want).
| [reply] [d/l] [select] |
|
|
Thanks for the reply!
By parallel process, I meant to use fork(). Consider a directory with multiple subdirectories. I will use a fork and find all the valid directories for each one and then merge the arrays.
Is it a bad idea?
| [reply] |
|
|
| [reply] |
Re: Finding files recursively
by bliako (Abbot) on Aug 05, 2019 at 02:11 UTC
|
Those who cannot remember the past are condemned to repeat it.
so do cache if the OS does not do this for you already
and while you wait for your cache to build, this is worth reading (I found)
... Before reaching the final line, however, he had already understood that he would never leave that room, for it was foreseen that the city of mirrors (or mirages) would be wiped out by the wind and exiled from the memory of men at the precise moment when Aureliano Babilonia would finish deciphering the parchments, and that everything written on them was unrepeatable since time immemorial and forever more, because races condemned to one hundred years of solitude did not have a second opportunity on earth.
bottomline: do cache but do not cache too much lest all be wiped out.
bw, bliako | [reply] |
Re: Finding files recursively
by tybalt89 (Monsignor) on Aug 06, 2019 at 14:57 UTC
|
If you are on a *nix box with locate, that might be faster.
| [reply] [d/l] |
|
|
If you are on a *nix box with locate, that might be faster.
But only if you search for files that existed the last time updatedb has been run. locate simply queries the database generated by updatedb. Depending on your system, updatedb runs from cron, or it has to be run manually. locate can't find files that did not exist while updatedb has run.
Alexander
--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
| [reply] |
|
|
| [reply] [d/l] |
| A reply falls below the community's threshold of quality. You may see it by logging in. |