Beefy Boxes and Bandwidth Generously Provided by pair Networks
laziness, impatience, and hubris
 
PerlMonks  

Best Way to Search and Delete files on a large Windows Filesystem

by dru145 (Friar)
on Mar 08, 2002 at 17:35 UTC ( [id://150377]=perlquestion: print w/replies, xml ) Need Help??

dru145 has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

Wow, I haven't had to post a question in awhile. I guess I'm almost a half of monk : )

Anyways, I need some help with a Perl script I wrote for a fellow team member of mine. Here is a little background. They were basically going through a large file server (98GB) and manually finding files of certain extensions (.exe, .nsf, .mp3) and deleting them. They asked me to write a Perl script to do this automatically for them, which I was happy to do. They also asked if I could add a function to let them know how far along the script has gotten and to make sure it hasn't locked up(hence the print statements) while the script is running.

My question is, does the Locate_Files sub and Delete sub print out all of the matches it finds AND then go back through the filesystem and add the files to the array? Or does it print out the matches and then add the file to the array and move on to the next file? If not, what is the best way to do this? The script seemed to lock up the 1st time we ran it and I'm thinking this could have something to do with it, since it might be going through the server printing out all of the matches, then going back again and adding them to an array which seems like a stupid way of doing it.

You might also be wondering why I'm using File::Recurse. Well, I couldn't get File::Find to print out the absolute path of the file (ie. C:\data\test\ie.exe). Instead it would just print out the file name (ie.exe). I needed the whole path to be included in the email.

TIA for any help.
# use File::Recurse; use Mail::Sender; use strict; use warnings; my @files; my $addresses = 'jsmith@acme.com, jdoe@acme.com'; my $from_addr = 'perlscript@fileserver.acme.com'; # Search the diretory tree for the file types specified recurse(\&Locate_Files, "E:\\data"); # delete the files &delete(); # Send an email stating which files where deleted &send_mail(); ################# # Subroutines # ################# sub Locate_Files { if ( -f $_ ) { print "Found a match: $_\n" if ( $_ =~ /\.nsf$|\.exe$/i); push (@files,"$_") if ( $_ =~ /\.nsf$|\.exe$/i); } } #end Locate_Files sub sub delete { foreach (@files){ print "Deleting the following file: $_\n"; unlink || warn "Can not delete file $_: $!\n"; } } #end delete sub sub send_email { my $sender; ref ($sender = new Mail::Sender({from => "$from_addr", smtp => "apccorp"})) or die "$Mail::Sender::Error\n"; $sender->Open({to => "$addresses", subject => "Deleted Files Rep +ort"}); $sender->SendLine; $sender->Send(<<"END"); Team, Here is the list of files that where deleted the last time the Perl sc +ript was run on the data folder: @files END $sender->Close; } #end send_email sub

Thanks,
Dru
Another satisfied monk.

Replies are listed 'Best First'.
Re: Best Way to Search and Delete files on a large Windows Filesystem
by demerphq (Chancellor) on Mar 08, 2002 at 17:59 UTC
    Ok. The problem you had with File::Find (which I personally would feel more comfortable with, ive never even heard of File::Recurse) is that you didnt use the option no_chdir and/or you didn't use $File::Find::name. (Actually I have serious reservations about using a module that isnt in the standard distro when there is one in the standard distro that does the same thing. Theres good reason why it got into the standard distro in the first place.)
    use strict; use warnings; use File::Find; # Extensions to match my @exts=qw(.nsf .exe); my $search_root='D:\\Bin\\'; # where to start the search # Build a regex my $rexstr=join'|',map {quotemeta $_} @exts; my $rex=qr/(?:$rexstr)$/i; # list of filespecs my @list; # Find em thanks find({ wanted =>sub{push @list,$_ if (/$rex/ && -f)} , no_chdir => 1 } +, $search_root); # And print em out print join "\n",@list;
    Adding the email code and etc is left as an exercise for the read... ;-)

    BTW: Ive found there is a cute little trick with File::Find under windows. If you use MS style paths (ie backslashes) and dont take advantage of perls ability to handle either then File find will return paths like

    D:\Bin\/autoruns.exe
    Which means that you can easily tell where the childrens path starts by looking for slashes and not backslashes. (Dont worry about \/ Perl handles it transparently.) But be aware that using the trick makes your code completely unportable.

    HTH

    Yves / DeMerphq
    --
    When to use Prototypes?
    Advanced Sorting - GRT - Guttman Rosler Transform

      I use File::Spec->canonpath($File::Find::name) to get cleaned up paths on Win32. Note that Cygwin and ActiveState handle paths a little bit differently. I generally prefer the way ActiveState does it.
        Yeah ive done the same thing, although I generally use regexes if the output is from File::Find. But my point was that using the path as returned is fine and that taking advantage of the backslash change can be quite useful. For instance for creating a HOH of the directory structure. You can just do something like
        my($root,$path)=$funnyspec=~m!([^/]+)([^\]+)!; my @parts=split('/',$path);
        (well YKWIM)

        Anyway, I have to admit that I almost never use my cygwin version of Perl.

        Yves / DeMerphq
        --
        When to use Prototypes?
        Advanced Sorting - GRT - Guttman Rosler Transform

      perl experts,
      Can you show the whole code when you get it working so I can learn from it?
Re: Best Way to Search and Delete files on a large Windows Filesystem
by perrin (Chancellor) on Mar 08, 2002 at 17:52 UTC
    To get the full path with File::Find, just use $File::Find::name. It's right there in the docs. I am using this on Win32 with no problems.

    However, there's nothing really wrong with what you're doing here. Your Locate_Files sub (loss of style points for the init caps) doesn't need to run that regex twice, but that won't save you much. It's not going through the whole file system twice as you feared.

Re: Best Way to Search and Delete files on a large Windows Filesystem
by OzzyOsbourne (Chaplain) on Mar 08, 2002 at 18:20 UTC

    I have a system for this. The caller script opens a bunch of instances of the main script (1 per server). The main script hunts down all the media files on the servers and puts them into logs called nameofserver.log. Then I run the siftall script to go through all of those logs to sift into logs by file type. Then the deleter script will delete every file listed in a particular log.

    The caller script is where the performance gains are made. It allows me to run 100+ servers simultaneously, rather than sequntially. Most use strict. I apologise for those that don't. They were written long ago and never rewritten.

    Let me know if you need more info on this...

    Main Script

    #finds .mp3.avi.exe.mpg.mpeg.mpe.wav.zip.mov.rmj.wma files on a server #called from the run multi scripts #added swf 5/17/01 #added pst 7/27/01 # ADDED ANOTHER NESTED IF FOR //SERVER/USERS # 8.09.01 added .ogg (ogg-vorbis files) use Getopt::Std; use File::Find; getopt('s'); # ********************************* # Process arguments ([h]elp,[s]erver) # ********************************* if ($opt_s){ $server=$opt_s; }else{ print "Please Enter Server name:"; chomp($server=<STDIN>); } $dir1="//$server/e\$/users"; if (!(-e "$dir1")){#if directory doesn't exist try d$ $dir1="//$server/d\$/users"; if (!(-e "$dir1")){ $dir1="//$server/users"; if (!(-e "$dir1")){ die "Directory not does not exist on $server\n...Exiting S +cript.\n"; } } } $out="//workstation/share/serverlogs/$server\.tmp"; $out2="//workstation/share/serverlogs/media/$server\.txt"; open (OUTFILE, ">$out") or die "Cannot open $out for write :$!"; print "finding media files on $dir1\.\.\.\n"; find ({wanted => \&wanted, no_chdir=>1}, $dir1); #find(\&wanted, $dir1); sub wanted { if (!("$File::Find::dir"=~/}/)&&(/\.asf$|\.mp.{0,2}$|\.avi$|\.exe$ +|\.wav$|\.zip$|\.mov$|\.rm.?$|\.wm.?$|\.qt$|\.mid.?$|\.ra.?$|\.swf$|\ +.pst$|\.ogg$|\.gho$/i)){ print OUTFILE "$_\n"; print "$_\n"; } } close OUTFILE; open (OUTFILE, "$out") or die "Can't open"; open (OUTFILE2, ">$out2") or die "Can't open"; @input=<OUTFILE>; foreach (@input){ s/\//\\/g; print OUTFILE2 "$_"; } close OUTFILE; close OUTFILE2; unlink $out;

    Calls the Main script

    # Created on 9/6/00 @all=('SERVER1','SERVER2); use Win32::Process; sub ErrorReport{ print Win32::FormatMessage( Win32::GetLastError() ); } foreach $server (@all){ Win32::Process::Create($ProcessObj, "c:\\program files\\perl\\bin\\perl.exe", "perl.exe c:\\public\\perl5\\cleanup\\findstuff6.pl -s +$server", 0, NORMAL_PRIORITY_CLASS, ".")|| die ErrorReport(); #$ProcessObj->Suspend(); #$ProcessObj->Resume(); #$ProcessObj->Wait(INFINITE); }

    Log Sifter

    # sifts all the server logs based on media type # added func to print -ok at end of empty files, Federal added 5/31/01 # added mpga on 6.08.01 # added pst on 7.27.01 # added a backup of old logs a MMDDYY directory based on the last exe. +txt time stamp use strict; use File::Copy; my ($type, $server,$out,$in,@input,$total,$kbytes,$mbytes); my @servers=('SERVER1','Server2'); my $dir1='//workstation/share/serverlogs/media'; my @types=('swf','asf','avi','mp2','mp3','mpg','mpga','mpe','mpeg','wa +v','mov','qt','mid','midi','ra','ram','rmi','rmj','rmx','zip','exe',' +wma','pst','ogg','gho'); ###################### # Create a new directory with the date of the last exe.txt MMDDYY unde +r sifted ###################### my @statarray=stat('c:/share/serverlogs/media/sifted/exe.txt'); my @statarray2=localtime($statarray[9]); my $month=$statarray2[4]+1; my $year=$statarray2[5]-100; my $dirname=sprintf("$dir1/sifted/%.2d%.2d%.2d",$month,$statarray2[3], +($statarray2[5]-100)); ###################### # Backup the old logs to the new dir ###################### opendir(DIR, "$dir1/sifted") or die "can't opendir $dir1: $!"; my @files = grep { !/^\./ && -f "$dir1/sifted/$_" } readdir(DIR); closedir DIR; mkdir ("$dirname")or die "Couldn't mkdir! $!"; foreach (@files){ copy("$dir1/sifted/$_","$dirname")or die "Couldn't Copy! $!"; } unlink <$dir1/sifted/*.txt>; ###################### # Sort the logs ###################### foreach $type (@types){ $total=0; my $out="$dir1/sifted/$type\.txt"; my $out2="$dir1/sifted/$type-ok\.txt"; open OUT, ">$out" or die "Cannot open $out for write :$!"; foreach $server (@servers){ $in="$dir1/$server\.txt"; open IN,"$in" or next; @input=<IN>; chomp @input; foreach (@input){ if (/\.$type$/i){ $kbytes = (stat)[7]/1024; $total+=$kbytes; print OUT "$_\t$kbytes KB\n"; } } close IN; } $mbytes=$total/1024; print OUT "\n\nTotal: $mbytes MB\n"; close OUT; if ($mbytes eq 0){ rename $out, $out2; } print "Finished $type...\n"; }

    Delete files by log

    #deletes MP3 files noted in the sifted MP3 log # usage: deletemp3.pl mp3[enter] use strict; my $infile="//workstation/share/serverlogs/media/sifted/$ARGV[0].txt"; my $outfile="//workstation/share/serverlogs/media/sifted/$ARGV[0]-dele +ted.txt"; my %filehash; open IN, "$infile" or die "Cannot open $infile for write :$!"; my @input=<IN>; close IN; foreach (@input){ my ($file,$size)=split /\t/; $filehash{$file}=$size; } foreach (sort keys %filehash){ if(-e){ unlink "$_" or warn "\ncan't delete $_:$!\n"; print "$_ deleted\n"; }else{ print"file does not exist\n" } } rename $infile, $outfile;

    -OzzyOsbourne

Re: Best Way to Search and Delete files on a large Windows Filesystem
by vladb (Vicar) on Mar 08, 2002 at 17:55 UTC
    Hmm, I never hoped to ever see 'the best' way of doing anything really. So, I should consider myself exctremely lucky today to have come across your node ;). Actually, I don't think there's 'the best' way in anything. There are some good ways (including yours here), yet not 'the best'.

    A few suggestions on code. You should probably consider staying away from using 'global' variables such as @files, $addresses, and etc. With the exception of rare cases you should never use global variables. change your send_email() and delete() method to accept parameters, somewhere along those lines:
    sub delete { # accept a reference to an array (containing path values # to files to be removed) my $files_aref = shift; foreach (@$files) { # ... rest of your code ... } } # ... do something similar with the rest of our subs ...
    That was my 2 cents ;).

    "There is no system but GNU, and Linux is one of its kernels." -- Confession of Faith
OT - If you're my sysadmin, I'd better clean up.
by t'mo (Pilgrim) on Mar 08, 2002 at 18:35 UTC

    I find the most interesting part of this are the contents of the regexen you're all using to find the files to delete. You guys don't work at my company, do you? What will I do with all my most important files?    :-)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://150377]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others surveying the Monastery: (2)
As of 2024-04-20 08:37 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found