I wrote a horribly slow solution to a terribly easy problem. Then I wrote a different solution which ended up even slower. Yes I'm a neophyte. And yes this is Windows, but I think I can do better with the help of some mad monks and I don't want to let PERL down by living with my original poor attempts. Not looking for anyone to code this for me, just to suggest the best approach of how to do it. Anyway...
Problem:
Given a file which has a listing of potential files, one per line. Some on the same server, some in the same directory, some which really exist, some which may not. Example input:
\\serverA\directoryA\FileA
\\serverA\directoryA\FileB
\\serverA\directoryB\subdirectoryA\FileA
\\serverA\directoryB\subdirectoryA\FileB
\\serverB\directoryA\FileA
\\doesntexist\doesntexist\doesntexist
The PERL script should validate if the file exists or not. If it does, it should write the size and date modified attributes for the file, along with the file name. The 3 pieces of information should be pipe delimited. If it does not exist, mark the file name anyway, but put "notfound" for the 2 attributes. So for the input above, the output should look something like this:
\\serverA\directoryA\FileA|25521|10/19/2011 01:32 PM
\\serverA\directoryA\FileB|14288|07/28/2011 09:31 AM
\\serverA\directoryB\subdirectoryA\FileA|13384|07/28/2011 09:30 AM
\\serverA\directoryB\subdirectoryA\FileB|15667|08/05/2011 08:53 AM
\\serverB\directoryA\FileA|15274|08/11/2011 03:15 PM
\\doesntexist\doesntexist\doesntexist|notfound|notfound
Here is my first attempt. running this on an input file of 20,000 entries to just over 3 minutes, which I think is piss poor slow:
use POSIX; my $timestamp; # eg 10/24/2011 5:01:22 PM my $size; # size of file will be bytes my @inp; #array for input files my $outfile; #second argument the output my $statfile; #third argument the status # open status and report in progress. Will abort if status file alread +y here if (-e $ARGV[2]) { die "status file already exists!"; } open(STAT, ">", $ARGV[2]) || die "can't create statfile $ARGV[2]"; print STAT "inprogress"; close(STAT); #quit unless we have the correct number of command-line args my $num_args = $#ARGV + 1; if ($num_args != 3) { print "\nUsage: perl sizeDateValidator.pl <listOfFilesToValidate> <out +putFile> <statusFile>\n"; open(STAT, ">", $ARGV[2]) || die "can't create statfile $ARGV[2]"; print STAT "error1"; close(STAT); exit; } #grab files to validate from first argument which should be a readable + file if (-e $ARGV[0]) { open(FILE, "<", $ARGV[0]) || die "Can't open file $ARGV[0]"; @inp = <FILE>; chomp @inp; close FILE; } else { open(STAT, ">", $ARGV[2]) || die "can't create statfile $ARGV[2]"; print STAT "error2"; close(STAT); exit; } #check to make sure output file doesn't already exist like from previo +us run. #don't want to overwrite anything I shouldn't $outfile = $ARGV[1]; if (-e $outfile) { open(STAT, ">", $ARGV[2]) || die "can't create statfile $ARGV[2]"; print STAT "error3"; close(STAT); exit; } #for each file in list, get size and timestamp #if the element is not a file, report this #write this to output file passed in as argument. open(FILE, ">", $outfile) || die "Can't open file $outfile"; foreach my $files(@inp){ $files =~ s/"//g; if (-f $files) { $timestamp = POSIX::strftime("%m/%d/%Y %I:%M %p", localtime(( stat $fi +les)[9])); # or change localtime to gmtime $size = (stat $files)[7]; print FILE "$files|$size|$timestamp\n"; } else { print FILE "$files|notfound|notfound\n"; } } close(FILE); open(STAT, ">$ARGV[2]") || die "can't create statfile $ARGV[2]"; print STAT "success"; close(STAT);
my $num_args; # for inp arg validation my $statfile; #file for reporting progress of execution; third arg my $inp; #input file; first arg my %tempdir; #hash for storing unique file paths my $foo; #%tempdir keys my $dircommand2; #dir /-C my @rawinp2; #array to hold results of dir command my $direntry2; # for individual @rawinp2 processessing my $stripfile; # stripped down entry my $timestamp2; #catch regex results my $bytes2; #catch regex results my $file_name2; #catch regex results my @array_of_filestats; # holding output of stats e.g. \\swstgsqldb01\ +D$\mktemp\AnaWeekly_20110817.docx|13506|08/17/2011 02:42 PM my @inpobjects; # holding input objects to compare to my $outfile; #output file; third arg my $trimmed; # elements from @array_of_filestats my $count; # counter to check for non-existing targets my @pieces; # split results file entries #Will abort if status file already here if (-e $ARGV[2]) { die "status file already exists!"; } # open status and report in progress $statfile = $ARGV[2]; open(STAT, ">", $statfile) || die "can't create statfile $statfile"; print STAT "inprogress"; close(STAT); #quit unless we have the correct number of command-line args $num_args = $#ARGV + 1; if ($num_args != 3) { print "\nUsage: perl sizeDateValidator.pl <listOfFilesToValidate> <out +putFile> <statusFile>\n"; open(STAT, ">", $statfile) || die "can't create statfile $statfile"; print STAT "error1"; close(STAT); exit; } #Get every unique path from input file %tempdir; $inp = $ARGV[0]; open(INP, $inp)||die "Can't open: $inp"; while(<INP>){ chomp; ($stripfile) = /(.*\\).*$/; $tempdir{$stripfile}++; } close(INP); #run our dir command on each of the individual keys and store in array foreach $foo (keys %tempdir) { $dircommand2 = "dir /-C "; if (-e $foo) { $dircommand2 = $dircommand2 . $foo; @rawinp2 = `$dircommand2`; foreach $direntry2(@rawinp2) { next if $direntry2 =~ /<DIR>/; #next if !($timestamp = ($direntry =~ /^\d{2}\/\d{ +2}\/\d{4}\s+\d{2}:\d{2}\s[AP][M]/g)[0]); #if it doesn's start MM/DD/Y +YYY then skip next if !(($timestamp2, $bytes2 , $file_name2 ) = +(($direntry2 =~ /^(\d{2}\/\d{2}\/\d{4}\s+\d{2}:\d{2}\s[AP][M])\s+(\d* +)\s+(.*)$/)[0,1,2])); $file_name2 = $foo . $file_name2; $timestamp2 =~ s/\s{2}/ /; push (@array_of_filestats, "$file_name2|$bytes2|$t +imestamp2\n"); } } } # get array of input objects open(ARGINP, $inp)||die "Can't open: $inp"; while(<ARGINP>){ chomp; push(@inpobjects,$_); } close(INP); #check to make sure output file doesn't already exist like from previo +us run. #don't want to overwrite anything I shouldn't $outfile = $ARGV[1]; if (-e $outfile) { open(STAT, ">", $statfile) || die "can't create statfile $statfile"; print STAT "error3"; close(STAT); exit; } # if matches are made, print stats to output file; if not record "notf +ound" open(FILE, ">", $outfile) || die "Can't open file $outfile"; for ($i = 0; $i < @inpobjects; $i++) { chomp $inpobjects[$i]; $count = 0; foreach $trimmed(@array_of_filestats) { $count++; chomp $trimmed; @pieces = split(/\|/, $trimmed); #print "inpobject: $inpobjects[$i]\npieces: $pieces[0]\n\n"; if ($inpobjects[$i] eq $pieces[0]){ print FILE "$trimmed\n"; last; } else { if (($#array_of_filestats + 1) == $count) { print FILE "$inpobjects[$i]|notfound|notfound\n"; } } } } close(FILE); open(STAT, ">", $statfile) || die "can't create statfile $statfile"; print STAT "success"; close(STAT);
Any monks out there that can suggest a better, faster approach?
many thanks in advance
In reply to sizeDateValidator.pl is horribly slow by msensay
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |