Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Monks,

I have a script that scans an inputed list of user directory locations, and then reports the name and size of the user directory. The information is output into a CSV file. At the end of my script, I call a program I wrote in VB that will input that CSV file into a table in a MS Access DB.

My problem is that my "output.csv" file is not complete. For some reason the script usually only gets about half of the information into the CSV file, although it completes without a hitch. I should mention that I am scanning a very large volume and the script often takes 20+ hours to run.

Would invoking a SLEEP function before I call my VB program improve this at all? Or the better question would be, what do I need to do to ensure that my entire output is recorded to my "output.csv" file? Here is my code:
#!/usr/bin/perl use File::Find; use strict; use Time::localtime; use File::Spec; my $cur = File::Spec->curdir; my $up = File::Spec->updir; my $dirtot; my ($dir, @parts, $error, $starttime, $runtime, $endtime, $runmin, $ye +ar, $month, $day, $tm, $date, $rootdir, $userdir, $slash, $count, $runhour +, $starttime, $runtime, $endtime, $runmin, $year, $month, $day, $tm, $da +te); $starttime = (time); $slash = "//"; #Opens the input and output files open(IN, "< input.txt") or die("Couldn't open input.txt\n"); open(OUT, "> output.csv") or die("Couldn't open output.csv\n"); #Separates the newline delimited rows chomp ( @parts = <IN> ); #Used to calculate how long the program took to run $starttime = (time); #Prints headings print OUT "Path,User,MB,Date,Error\n"; #Displays total for subdirectories foreach my $start (@parts) { my @dirs = &find_subdirs($start); foreach my $dir (@dirs) { $tm = localtime; $year = $tm->year+1900; $month = $tm->mon+1; $day = $tm->mday; $date = "$month-$day-$year"; ($rootdir,$userdir) = split /$slash/,$dir; print "\tWalking $dir\n"; my $total = 0; find sub { $total += -s}, $dir; $dirtot = $dirtot + $total; #Used for determining tota +l directory size $total = ($total / 1024) / 1024; #Converts to MB $total = sprintf("%0.2f", $total); #Formats output to 2 de +cimals print OUT"$rootdir,$userdir,$total,$date,$!\n"; } $dirtot = ($dirtot / 1024) / 1024; #Converts to MB $dirtot = sprintf("%0.2f", $dirtot); #Formats total to 2 deci +mals print STDERR "$start: No subdirectories\n" unless @dirs; } $endtime = (time); $runtime = $endtime - $starttime; $runmin = $runtime/60; $runmin = sprintf("%0.1f", $runmin); $runhour = $runmin/60; $runhour = sprintf("%0.1f", $runhour); if ($runtime <= 60){print "\n\nCompleted processing in $runtime second +s\n\n";} elsif ($runmin <=60) {print "\nCompleted processing in ". $runmin . " +minutes\n\n";} else {print "\nCompleted processing in ". $runhour ." hours\n\n";} # This closes output file and then calls an external executable that # will insert CSV output into an MS Access database table close(OUT); system "csv2access.exe"; print "\nOutput created."; sub find_subdirs { my $start = shift; unless(opendir(D, $start)) { warn "$start: $!\n"; next; } my @dirs = map { -d "$start/$_" && !-l "$start/$_" && $_ ne $cur && $_ ne $up ? "$start//$_" : () } readdir(D); closedir(D); @dirs; }

Any ideas?

Replies are listed 'Best First'.
Re: Output isn't complete
by waswas-fng (Curate) on Jul 11, 2003 at 15:55 UTC
    I wonder, how large are these filesystems you are searching with this script? The reason I ask is that I do this type of thing quite often on very large filesystems (4tb+) and it does not take anywhere near 20 hours to run. (I think last time I timed it took less that 15 minutes). The other thing to note is you can merge find_subdirs and the other find into one section, that way you can use the magic stat var to speed up your tests across the board.

    -Waswas
      I should have mentioned that I am very new to Perl. I am not running this on anything larger than 250gb. The speed problem is likely due to my inexperience in programming with Perl.

      If you don't mind my asking, how would I go about merging the find_subdirs together? If I could even cut the run-time on this script in half, I would be ecstatic.
        Just so I know for sure, could you give me a few lines from the input file so I know for sure what you are dealing with. After that I will have time this afternoon to refactor your code and see how much faster I can get it.

        -Waswas
Re: Output isn't complete
by Anonymous Monk on Jul 11, 2003 at 15:51 UTC
    I don't know if I need to mention this or not, but I am running ActivePerl on a Win2k machine.
Re: Output isn't complete
by bm (Hermit) on Jul 14, 2003 at 14:35 UTC
  • use warnings;
  • use strict;
  • use File::Find;

    Once you are using those modules, perhaps a debug statement in sub wanted will help you isolate what part of the filesystem is confusing your script - something like your print "\tWalking Dir\n"line, but include a time.

    20 hours is a very long execution time. I think you need to concentrate on working out where the bottleneck is. This can be done through debugging/logging.