In the first try, you are calling stat numerous times on each file, and that's wasting some amount of time. Call stat once per file, and save all its information for your various actions.

As for how long it should take to scan 20,000 files, what sort of time span are you expecting, and what sort of evidence (what sorts of processes) lead you to expect that?

There are some other trivial oddities in your first script -- I expect they don't affect the timing much (if at all), but they detract from the overall coherence of the code. Oh, and consistent indenting is useful...

Here's how I would do it:

use POSIX; # Get argv handling out of the way first... if ( @ARGV != 3 or ! -f $ARGV[0] ) { die "Usage: perl $0 FileListToValidate OutFile StatusFile\n"; } # Next take care of all the i/o file handling... if ( -e $ARGV[2] ) { die "$ARGV[2] already exists -- I will not overwrite it\n"; } open( STAT, '>', $ARG[2] ) or die "Can't write status info to $ARGV[2] +: $!\n"; if ( ! open( OUT, '>', $ARGV[1] ) { print STAT "error: can't write output to $ARGV[1]: $!\n"; exit; } if ( ! open( IN, '<', $ARGV[0] ) { print STAT "error: can't open $ARGV[0] for input $!\n"; exit; } # Now get to work... my @inpList = <IN>; chomp @inpList; for ( @inpList ) { # let $_ hold the file name tr/"//d; # get rid of double-quotes my @stats = stat; # do this just once (works on $_ by default) if ( ! @stats ) { # empty list means stat failed print OUT join( '|', $_, ( 'notfound' ) x 2 ), "\n"; } else { print OUT join( '|', $_, $stats[7], POSIX::strftime( "%m/%d/%Y %I:%M %p", localtime( $stats[9] + )), "\n"; } } print STAT "success\n";
That eliminates a lot of useless variable creations and value assignments, but I think reducing the multiple stat calls per file to just one will be the thing that has a noticeable effect.

Personally, I'd go with just two command line args -- printing error messages (and even a "success" message) to stderr should suffice, so you just need the input list and the name to use for the output list (and you eliminate two possible causes of failure).

As for the second try, processing the output of some other command is bound to take longer (and can cause more trouble). Don't do that when a perl internal function can do the same thing.


In reply to Re: sizeDateValidator.pl is horribly slow by graff
in thread sizeDateValidator.pl is horribly slow by msensay

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.