in reply to Re^2: Finding files recursively
in thread Finding files recursively

$target is just the filename you are looking for, "secret.file" in your case.
The difference is that my code is exiting the wanted function immedeatly when it is not dealing with a directory. Only if there is a directory it is looking wether the target file is in that directory.

Whereas your code looks at each and every file, calculates its' base path (albeit unneccessary, that info is already there in $File::Find::name). And then it takes that base directory to look for the target file.
This, and this is the biggest slowdown, also means that you are testing the same directory number-of-entries-in-the-directory times.


holli

You can lead your users to water, but alas, you cannot drown them.

Replies are listed 'Best First'.
Re^4: Finding files recursively
by ovedpo15 (Pilgrim) on Aug 05, 2019 at 06:40 UTC
    Back with results! :)
    I tried my code and your code. My code ran for 13858 seconds and your code ran for 16968 seconds. I thought it will reduce the time a little but it didn't, maybe because the machine was being used by others at that time but it is a big difference. Do you have any other suggestions? 4 hours for searching is quite a lot of time :(
      You should probably test on a smaller data set then? Anyway, I'm getting different results, my original code being roughly 55% faster on my single user machine (as expected).

      I added a native Perl implementation that walks the tree itself with no overhead and that gains you another significant speed boost.
      D:\ENV>perl pm10.pl Holli (New). Found: 1 ( D:\env\Videos/2012 ) Time: -19 Holli (original). Found: 1 ( d:\env/Videos/2012 ) Time: -32 ovedpo15. Found: 1 ( d:/env/Videos/2012 ) Time: -51
      Using this code.


      holli

      You can lead your users to water, but alas, you cannot drown them.
        Thank you for the good answer. It does reduce the time but not as much (like ~10 min of 4 hours), so I'm still hunting for more ideas.
        In the following link: https://stackoverflow.com/questions/2681360/whats-the-fastest-way-to-get-directory-and-subdirs-size-on-unix-using-perl
        Someone suggeted:

        I once faced a similar problem, and used a parallelization approach to speed it up. Since you have ~20 top-tier directories, this might be a pretty straightforward approach for you to try. Split your top-tier directories into several groups (how many groups is best is an empirical question), call fork() a few times and analyze directory sizes in the child processes. At the end of the child processes, write out your results to some temporary files. When all the children are done, read the results out of the files and process them.

        Is it possible to show what does he mean? I though maybe to implement a smart subroiute that can find big directories that contain subdirectories and use the idea to catch all the valid dirs and then merge into one array. Thank you again.