CHRYSt has asked for the wisdom of the Perl Monks concerning the following question:
Look, my first post. :)
Ok, I'm an uber perl noob, so go easy...
Anyway, I've got a script which is using File::Recurse to parse a directory, and returns a very simple HTML file with a link to each directory that it finds. (It strips the directories higher than the server's document root) This script appeasrs to work fine on small directories. However, I get into searching a really large directory with lots of subdirs, it takes a VERY long time, (3+hours)and apparently lots of memory. (the machine is a dual P3 600 /w 640MB, so it's not a hardware issue) Is there anything I can do to optimize this? Is there a better solution? Like I said, I'm a n00b, so this is based mostly off the File::Recurse's README.
#!/usr/bin/perl
use File::Recurse;
print "Content-Type: text/html\n\n";
print "<html><body><h1>Web Server Directory Listing</h1>";
my %files = &Recurse(['/Intranet/html'], {});
my @dirs = ();
my @files = ();
foreach (sort keys %files)
{
@dirs = split(/\/Intranet\/html/,$_);
print "<A href='@dirs'>@dirs</a><br>";
foreach (@{ $files{$_} })
{
@files = (@files,$_);
}
}
$f = @files;
$d = @dirs;
print "<br><br>";
print "$d total directories were found.";
print "$f total files were found.";
print "</body></html>";
Re: recurse directory script
by tamills (Acolyte) on Jan 09, 2002 at 19:31 UTC
|
Total Elapsed Time = -1.8e-00 Seconds
User+System Time = 100.0439 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c Name
99.9 100.0 100.00 1 100.00 100.00 main::f2
0.03 0.030 0.030 1 0.0300 0.0300 main::f1
sub f2 used the original method. sub f1 used push(). As you can see, it made quite a difference.
Drew
Be careful what you wish for... (Mr. Limpet)
For where you treasure is there shall your heart be also (Christ) | [reply] [d/l] [select] |
Re: recurse directory script
by mkmcconn (Chaplain) on Jan 09, 2002 at 05:45 UTC
|
CHRYSt, you'll find many posts here directing you to File::Find. It's very easy to use (although there
are some peculiarities that may put you off at first, much
clearer code can be produced with it than you'll find
in my sample below)
Download the snippet below, and see if it does pretty much what you want. I can hardly imagine it taking 3 hours,
even on a very large and deep directory.
If it works for you, take a closer look at the module and the examples in the documentation (perldoc File::Find).
#!/usr/bin/perl -w
use strict;
use File::Find;
print "Content-Type: text/html\n\n";
print "<html><body><h1>Web Server Directory Listing</h1>";
my $dir_count = 0;
my $file_count= 0;
find(\&{ sub {
if (-d $_){ # File::Find puts us in the directory.
# We can stat, copy or rename
# without needing to know
# which directory we are in.
print $dir_count > 0?"</ul>\n":"";
print "<h3>Directory ",
++$dir_count,
": <a href='$File::Find::dir/$_'>$_</a></h3>\n<ul> \
+n" ;
# $File::Find::dir
# is the current directory
}
else{
print "<li>",
++$file_count,
" <a href='$File::Find::name'>$_</a></li>\n" ;
# $File::Find::name
# is the full path of the c
+urrent file.
}
}
}, '/Intranet/html');
print "\n</ul>";
print "</body></html>";
mkmcconn
fiddled with text after posting. | [reply] [d/l] |
Re: recurse directory script
by dmmiller2k (Chaplain) on Jan 09, 2002 at 20:02 UTC
|
++tamills and ++LordAvatar for their helpful tips. In particular, tamills's advice should produce a massive performance boost, because in your code, @files = (@files,$_); is essentially replacing the entire contents of @files with its prior contents plus a new element, EVERY TIME you add to it.
The time it takes to add a new element this way increases exponentially with the number of elements in the array, versus push which runs in approximately constant time.
Mustn't forget the, ahem, obligatory standard advice:
As a rule, put -w on your shebang line, and use strict;, and take the time to remove any errors or warnings reported.
#!/usr/bin/perl -w
use strict;
use File::Recurse;
# ...
BTW, subroutine calls no longer require the & prefix character. Using them makes your code unnecessarily more difficult to read.
Update: fiddled with text.
dmm
You can give a man a fish and feed him for a day ...
Or, you can teach him to fish and feed him for a lifetime
| [reply] [d/l] [select] |
|
BTW, subroutine calls no longer require the & prefix character. Using them makes your code unnecessarily
more difficult to read.
To the contrary, I find that prepending an & to
subroutine calls makes my code easier to read. For
one thing, it helps some syntax-hilighting editors hilight
the subs, which is helpful. For another, it keeps with the
theme of "everything with a sigil is a thingy that you can
take a reference to", which I find helpful as well. The
obvious caveat is that the & makes the current @_
visible to the subroutine; this hasn't bitten me in the ass
yet, though.
--
:wq
| [reply] |
|
| [reply] |
Re: recurse directory script
by LordAvatar (Acolyte) on Jan 09, 2002 at 19:35 UTC
|
Chryst,
Try removing the print "<A href='@dirs'>@dirs</a><br>"; line.
You will slow down performance greatly by printing large directory listings
to the screen.
-Lord Avatar
"A simple truth is but a complicated lie..." -Nietzche
| [reply] [d/l] |
THANKS!! Re: recurse directory script
by CHRYSt (Acolyte) on Jan 09, 2002 at 23:12 UTC
|
Wow, didn't expect help so quick.
Using push, plus a slight change in the way it prints each link brought the time down to ~5 mins.
instead of print "<A href='@dirs'>@dirs</a><br>"; I used just print "<A href='$_'>$_</A><br>";
works now. Thanks again. :) | [reply] [d/l] [select] |
Re: recurse directory script
by Anonymous Monk on Jan 10, 2002 at 00:17 UTC
|
You need to dump the library and roll your own.
%files is eating up memory since you are storing the data there for a time before using it. Write a breadth first traversal routine and stream the output to the browser without storing more than one line at a time. Use a command pipe with a sorted ls command as the stream. Use counters and -d -f tests to print stats. You can probably find a breadth first traversal on the web. | [reply] |
|
|