I need to optimize performance for script that supposed to scan filesystem and collect info on *.msg files.

I made a script recursive but I'm not sure that this is the best way for implementation.

Additionally I look for top boxes with sort on hash keys

The directory structure is /test_vol/0/00/34567@test.com/1_sdr/34567.msg where range is 0-9 and 00-99 and number of mail accounts is huge

The code is:
sub ScanDirectory { my ($workdir) = @_; my($startdir) = &cwd; # keep track of where we began chdir($workdir) or die "Unable to enter dir $workdir: $!\n"; opendir(DIR, ".") or die "Unable to open $workdir: $!\n"; my @names = readdir(DIR); closedir(DIR); foreach my $name (@names){ next if ($name eq "."); next if ($name eq ".."); next if ($name =~ /\.dat$|\.mdb^|\.snapshot/); if ( -d $name ) { if ($name =~ /^\d+\@\w/ ) { $all_mailbox_count++; $box_size=0; } &ScanDirectory($name); next; } if ( $name =~ /\.msg$/ ) { my $msg_size=(stat($name))[7]; if ( $msg_size < 4096 ) { $box_size+=4096; } else { $box_size+=$msg_size; } } } if ( $workdir =~ /(\d+)\@/ ) { $msisdn=$1; $all_mailbox_size+=$box_size; if ( $box_size == 0 ) { $empty_mailbox++; } else { &top_size_mailbox($msisdn,$box_size); } } chdir($startdir) or die "Unable to change to dir $startdir: $! +\n"; } sub top_size_mailbox { my ($msisdn,$box_size)=@_; if ( keys( %top_size_mailbox ) < $num_top_size_box ) { $top_size_mailbox{$box_size}=$msisdn; } else { my $min=(sort {$a <=> $b} keys %top_size_mailbox)[0]; if ( $box_size > $min ) { delete $top_size_mailbox{$min}; $top_size_mailbox{$box_size}=$msisdn; } } }

In reply to Optimizing performance for script to traverse on filesystem by gdanenb

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.