in reply to File::Find on huge, dynamic filesystems?

I'm running this script on a Solaris machine with large NFS (v3) mounted filesystems from NetApp filers. I just need to walk the filesystem searching for various file types, and then either compress or delete those files depending on their age. I'm sure this same script has been written a hundred times before, but I couldn't find one anywhere, so I'm doing it myself. It's going to be a time consuming process no matter what since I have to stat every file to get the size. Using GNU find is certainly an option, but I assumed (incorrectly perhaps) that it would be quicker to do all the work in perl. Thanks.
  • Comment on Re: File::Find on huge, dynamic filesystems?

Replies are listed 'Best First'.
Re: Re: File::Find on huge, dynamic filesystems?
by submersible_toaster (Chaplain) on Dec 10, 2002 at 23:32 UTC

    Hang about ! , if I have this correct - you want to traverse an NFS tree and action files that match your logic, but you are concerned about the FS changing under your feet. Is it possible you could nibble away at the output of whichever finder mechanism you choose, thus diminishing (hopefully) the time between 'knowing' about a file and actioning it?


    This is off the cuff code- OK!
    open ( FIND , '|' , '/usr/bin/find' , @findargs ) || die $!; while ( <FIND> ) { if ( $_ =~ /$myfileMatch/ ) { if ( -M $_ > 14 ) { # file modified more than 14days ago # delete file magic } else { # compress file magic } } }

    This way open is happily running find forked, whilst the script does dirty work. Of course if you are compressing monster files across NFS from a filer there is going to be a performance hit of somekind.

    Of course this may not be what you want, and doing it this way I strongly recommend the script keeps a record of exactly what the hell it's doing.

    Update: Since you are concerned about files that are ageing as opposed to new files, it perhaps does not matter too much that new data is added to this tree as the script runs. Reading back methinks that it's better to build a static list of matching files to operate on, then run through that list at the end of search, checking that they indeed -e exist.



    I also hope you only have to deal with NFS to the filer, coz SMB op-locks might cause you pain