Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Find new files in tree

by weini (Friar)
on Apr 29, 2002 at 08:56 UTC ( [id://162789]=perlquestion: print w/replies, xml ) Need Help??

weini has asked for the wisdom of the Perl Monks concerning the following question:

Dear brethren,
I have access to a huge fileshare (Win NT) and want to get an overview of what's happened the last n days.

So I wrote a script to
<<walk through the file tree>>
<<search for files 'newer' than n days>>
<<print results to an HTML-file>>.

The main thing is done using File::Find and the function

find(\&wanted, $dir); sub wanted { if ((-f "$_") && (-M "$_" < $age )) { # get stats and push $_ in array } }
So far, so good. But the script takes some five hours to finish (due to the size of the fileshare and network traffic). Now I'm asking for your input how to do better and faster.

I'd like to run the script regular to keep the info up to date. A possible solutiong might be creating a database and keep looping through the fileshare while updating the db if new files exist or if a file has changed after the last visit.

Thanx for any other suggestions!
BTW: Yes, I've read maintain control over very many files.

weini

Replies are listed 'Best First'.
Re: Find new files in tree
by grinder (Bishop) on Apr 29, 2002 at 10:42 UTC
    There's not much more you can do from here, apart from the fact that you could be statting the file twice. You should be using
    -f $_ and -M _ < $age

    but this will only give marginal improvements. If you install Perl on the distant server and run the script from there (i.e., locally, so as to exclude the network traffic cost from the script), does it run any faster?

    Unless you've modified the default registry settings, NT will reflect a change in a file timestamp in the directory timestamp as well. You may be able to use this as a test to see whether you have to stat all the files in the directory to see which file changed.

    On the other hand, I can't remember whether under NT it is possible to count the number of links to the current directory to determine whether there are any children directories (and thus, whether you have to traverse it or not to continue descending the tree).

    perl -le 'print( (stat $_)[3] ) for @ARGV' . .. /tmp

    And tr/'/"/ for Win32

    Damn, I just tested and it doesn't work. On Unix, if the number of links to '.' is 2, then you know it is referred to only by itself and its parent. If the numbe is higher, then a subdirectory must be referring to it as a parent. Unfortunately, on Windows 95 and Windows NT, nlink always returns 1.


    print@_{sort keys %_},$/if%_=split//,'= & *a?b:e\f/h^h!j+n,o@o;r$s-t%t#u'
Re: Find new files in tree
by belg4mit (Prior) on Apr 29, 2002 at 12:26 UTC
    Try Win32::ChangeNotify.

    --
    perl -pew "s/\b;([mnst])/'$1/g"

Re: Find new files in tree
by particle (Vicar) on Apr 29, 2002 at 13:23 UTC
    i believe -M $_ < $age and -f _ will offer a little more improvement over grinder's suggestion. (this assumes there will be fewer modified things than files, so the second test is performed less often. )

    but i'd use Win32::API and call native OS commands. they *should* be faster. you can get documentation for the win32 sdk online at msdn

    ~Particle *accelerates*

      Re: win32::API
      Eh hemm? Win32::ChangeNotify? It uses Win32::IPC, since IIRC, the windows kernel tracks file modifications already (which is why you can have explorer auto-refresh).

      --
      perl -pew "s/\b;([mnst])/'$1/g"

        Eh hemm? Win32::ChangeNotify?
        no. as i understand the original poster's requirement:
        I have access to a huge fileshare (Win NT) and want to get an overview of what's happened the last n days
        weini want's a post-fact report. Win32::ChangeNotify -- from the doc:
        Monitor events related to files and directories
        allows you to monitor files and directories in real-time. your suggestion meets a different requirement than that mentioned by the original poster.

        ~Particle *accelerates*

Re: Find new files in tree
by Rich36 (Chaplain) on Apr 29, 2002 at 13:58 UTC

    You could try forking a few processes and divide the work up between child processes - having each child read recursively down a separate directory will speed things up. You could feed it a list of directories to open - or even have your application find directories - and fork off processes based on the number of directories.


    Rich36
    There's more than one way to screw it up...

A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://162789]
Approved by rob_au
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others avoiding work at the Monastery: (4)
As of 2024-04-25 02:24 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found