recurse directory script

CHRYSt has asked for the wisdom of the Perl Monks concerning the following question:

Look, my first post. :)
Ok, I'm an uber perl noob, so go easy...
Anyway, I've got a script which is using File::Recurse to parse a directory, and returns a very simple HTML file with a link to each directory that it finds. (It strips the directories higher than the server's document root) This script appeasrs to work fine on small directories. However, I get into searching a really large directory with lots of subdirs, it takes a VERY long time, (3+hours)and apparently lots of memory. (the machine is a dual P3 600 /w 640MB, so it's not a hardware issue) Is there anything I can do to optimize this? Is there a better solution? Like I said, I'm a n00b, so this is based mostly off the File::Recurse's README.

#!/usr/bin/perl

use File::Recurse;

print "Content-Type: text/html\n\n";
print "<html><body><h1>Web Server Directory Listing</h1>";

my %files = &Recurse(['/Intranet/html'], {});

my @dirs = ();
my @files = ();
foreach (sort keys %files)
        {
        @dirs = split(/\/Intranet\/html/,$_);
        print "<A href='@dirs'>@dirs</a><br>";
        foreach (@{ $files{$_} })
                {
                @files = (@files,$_);
                }
        }
$f = @files;
$d = @dirs;
print "<br><br>";
print "$d total directories were found.";
print "$f total files were found.";
print "</body></html>";
[download]

Comment on recurse directory script Download Code

Replies are listed 'Best First'.
Re: recurse directory script by tamills (Acolyte) on Jan 09, 2002 at 19:31 UTC
The first thing to try is instead of doing this: `@files = (@files,$_);` do this: `push @files, $_;` Here are some results I got from profiling some comparable code: `Total Elapsed Time = -1.8e-00 Seconds User+System Time = 100.0439 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 99.9 100.0 100.00 1 100.00 100.00 main::f2 0.03 0.030 0.030 1 0.0300 0.0300 main::f1` [download] sub f2 used the original method. sub f1 used push(). As you can see, it made quite a difference. Drew Be careful what you wish for... (Mr. Limpet) For where you treasure is there shall your heart be also (Christ)	[reply] [d/l] [select]
Re: recurse directory script by mkmcconn (Chaplain) on Jan 09, 2002 at 05:45 UTC
CHRYSt, you'll find many posts here directing you to File::Find. It's very easy to use (although there are some peculiarities that may put you off at first, much clearer code can be produced with it than you'll find in my sample below) Download the snippet below, and see if it does pretty much what you want. I can hardly imagine it taking 3 hours, even on a very large and deep directory. If it works for you, take a closer look at the module and the examples in the documentation (perldoc File::Find). #!/usr/bin/perl -w use strict; use File::Find; print "Content-Type: text/html\n\n"; print "<html><body><h1>Web Server Directory Listing</h1>"; my $dir_count = 0; my $file_count= 0; find(\&{ sub { if (-d $_){ # File::Find puts us in the directory. # We can stat, copy or rename # without needing to know # which directory we are in. print $dir_count > 0?"</ul>\n":""; print "<h3>Directory ", ++$dir_count, ": <a href='$File::Find::dir/$_'>$_</a></h3>\n<ul> \ +n" ; # $File::Find::dir # is the current directory } else{ print "<li>", ++$file_count, " <a href='$File::Find::name'>$_</a></li>\n" ; # $File::Find::name # is the full path of the c +urrent file. } } }, '/Intranet/html'); print "\n</ul>"; print "</body></html>"; [download] mkmcconn fiddled with text after posting.	[reply] [d/l]
Re: recurse directory script by dmmiller2k (Chaplain) on Jan 09, 2002 at 20:02 UTC
++tamills and ++LordAvatar for their helpful tips. In particular, tamills's advice should produce a massive performance boost, because in your code, `@files = (@files,$_);` is essentially replacing the entire contents of `@files` with its prior contents plus a new element, EVERY TIME you add to it. The time it takes to add a new element this way increases exponentially with the number of elements in the array, versus `push` which runs in approximately constant time. Mustn't forget the, ahem, obligatory standard advice: As a rule, put `-w` on your shebang line, and `use strict;`, and take the time to remove any errors or warnings reported. `#!/usr/bin/perl -w use strict; use File::Recurse; # ...` [download] BTW, subroutine calls no longer require the `&` prefix character. Using them makes your code unnecessarily more difficult to read. Update: fiddled with text. dmm You can give a man a fish and feed him for a day ... Or, you can teach him to fish and feed him for a lifetime	[reply] [d/l] [select]
Re(2): recurse directory script by FoxtrotUniform (Prior) on Jan 10, 2002 at 04:24 UTC
BTW, subroutine calls no longer require the & prefix character. Using them makes your code unnecessarily more difficult to read. To the contrary, I find that prepending an & to subroutine calls makes my code easier to read. For one thing, it helps some syntax-hilighting editors hilight the subs, which is helpful. For another, it keeps with the theme of "everything with a sigil is a thingy that you can take a reference to", which I find helpful as well. The obvious caveat is that the & makes the current @_ visible to the subroutine; this hasn't bitten me in the ass yet, though. `-- :wq`	[reply]
Re (tilly) 3: recurse directory script by tilly (Archbishop) on Jan 10, 2002 at 07:24 UTC
I prefer leaving off the & because I have to program in multiple languages, and like to preserve some visual uniformity. But if you include it, as long as you also use parens you will not re-use @_. So you can put your mind at ease on that detail...	[reply]
Re: recurse directory script by LordAvatar (Acolyte) on Jan 09, 2002 at 19:35 UTC
Chryst, Try removing the `print "<A href='@dirs'>@dirs</a><br>";` line. You will slow down performance greatly by printing large directory listings to the screen. -Lord Avatar "A simple truth is but a complicated lie..." -Nietzche	[reply] [d/l]
THANKS!! Re: recurse directory script by CHRYSt (Acolyte) on Jan 09, 2002 at 23:12 UTC
Wow, didn't expect help so quick. Using push, plus a slight change in the way it prints each link brought the time down to ~5 mins. instead of `print "<A href='@dirs'>@dirs</a><br>";` I used just `print "<A href='$_'>$_</A><br>";` works now. Thanks again. :)	[reply] [d/l] [select]
Re: recurse directory script by Anonymous Monk on Jan 10, 2002 at 00:17 UTC
You need to dump the library and roll your own. %files is eating up memory since you are storing the data there for a time before using it. Write a breadth first traversal routine and stream the output to the browser without storing more than one line at a time. Use a command pipe with a sorted ls command as the stream. Use counters and -d -f tests to print stats. You can probably find a breadth first traversal on the web.	[reply]


"be consistent"
	PerlMonks