Beefy Boxes and Bandwidth Generously Provided by pair Networks
"be consistent"
 
PerlMonks  

recurse directory script

by CHRYSt (Acolyte)
on Jan 09, 2002 at 04:06 UTC ( [id://137318]=perlquestion: print w/replies, xml ) Need Help??

CHRYSt has asked for the wisdom of the Perl Monks concerning the following question:

Look, my first post. :)
Ok, I'm an uber perl noob, so go easy...
Anyway, I've got a script which is using File::Recurse to parse a directory, and returns a very simple HTML file with a link to each directory that it finds. (It strips the directories higher than the server's document root) This script appeasrs to work fine on small directories. However, I get into searching a really large directory with lots of subdirs, it takes a VERY long time, (3+hours)and apparently lots of memory. (the machine is a dual P3 600 /w 640MB, so it's not a hardware issue) Is there anything I can do to optimize this? Is there a better solution? Like I said, I'm a n00b, so this is based mostly off the File::Recurse's README.
#!/usr/bin/perl use File::Recurse; print "Content-Type: text/html\n\n"; print "<html><body><h1>Web Server Directory Listing</h1>"; my %files = &Recurse(['/Intranet/html'], {}); my @dirs = (); my @files = (); foreach (sort keys %files) { @dirs = split(/\/Intranet\/html/,$_); print "<A href='@dirs'>@dirs</a><br>"; foreach (@{ $files{$_} }) { @files = (@files,$_); } } $f = @files; $d = @dirs; print "<br><br>"; print "$d total directories were found."; print "$f total files were found."; print "</body></html>";

Replies are listed 'Best First'.
Re: recurse directory script
by tamills (Acolyte) on Jan 09, 2002 at 19:31 UTC
    The first thing to try is instead of doing this:

    @files = (@files,$_);

    do this:

    push @files, $_;

    Here are some results I got from profiling some comparable code:

    Total Elapsed Time = -1.8e-00 Seconds User+System Time = 100.0439 Seconds Exclusive Times %Time ExclSec CumulS #Calls sec/call Csec/c Name 99.9 100.0 100.00 1 100.00 100.00 main::f2 0.03 0.030 0.030 1 0.0300 0.0300 main::f1
    sub f2 used the original method. sub f1 used push(). As you can see, it made quite a difference.

    Drew

    Be careful what you wish for... (Mr. Limpet)
    For where you treasure is there shall your heart be also (Christ)

Re: recurse directory script
by mkmcconn (Chaplain) on Jan 09, 2002 at 05:45 UTC

    CHRYSt, you'll find many posts here directing you to File::Find. It's very easy to use (although there are some peculiarities that may put you off at first, much clearer code can be produced with it than you'll find in my sample below)

    Download the snippet below, and see if it does pretty much what you want. I can hardly imagine it taking 3 hours, even on a very large and deep directory.

    If it works for you, take a closer look at the module and the examples in the documentation (perldoc File::Find).

    #!/usr/bin/perl -w use strict; use File::Find; print "Content-Type: text/html\n\n"; print "<html><body><h1>Web Server Directory Listing</h1>"; my $dir_count = 0; my $file_count= 0; find(\&{ sub { if (-d $_){ # File::Find puts us in the directory. # We can stat, copy or rename # without needing to know # which directory we are in. print $dir_count > 0?"</ul>\n":""; print "<h3>Directory ", ++$dir_count, ": <a href='$File::Find::dir/$_'>$_</a></h3>\n<ul> \ +n" ; # $File::Find::dir # is the current directory } else{ print "<li>", ++$file_count, " <a href='$File::Find::name'>$_</a></li>\n" ; # $File::Find::name # is the full path of the c +urrent file. } } }, '/Intranet/html'); print "\n</ul>"; print "</body></html>";

    mkmcconn
    fiddled with text after posting.

Re: recurse directory script
by dmmiller2k (Chaplain) on Jan 09, 2002 at 20:02 UTC

    ++tamills and ++LordAvatar for their helpful tips. In particular, tamills's advice should produce a massive performance boost, because in your code, @files = (@files,$_); is essentially replacing the entire contents of @files with its prior contents plus a new element, EVERY TIME you add to it.

    The time it takes to add a new element this way increases exponentially with the number of elements in the array, versus push which runs in approximately constant time.

    Mustn't forget the, ahem, obligatory standard advice:

    As a rule, put -w on your shebang line, and use strict;, and take the time to remove any errors or warnings reported.

    #!/usr/bin/perl -w use strict; use File::Recurse; # ...

    BTW, subroutine calls no longer require the & prefix character. Using them makes your code unnecessarily more difficult to read.

    Update: fiddled with text.

    dmm

    You can give a man a fish and feed him for a day ...
    Or, you can
    teach him to fish and feed him for a lifetime

      BTW, subroutine calls no longer require the & prefix character. Using them makes your code unnecessarily more difficult to read.

      To the contrary, I find that prepending an & to subroutine calls makes my code easier to read. For one thing, it helps some syntax-hilighting editors hilight the subs, which is helpful. For another, it keeps with the theme of "everything with a sigil is a thingy that you can take a reference to", which I find helpful as well. The obvious caveat is that the & makes the current @_ visible to the subroutine; this hasn't bitten me in the ass yet, though.

      --
      :wq
        I prefer leaving off the & because I have to program in multiple languages, and like to preserve some visual uniformity.

        But if you include it, as long as you also use parens you will not re-use @_. So you can put your mind at ease on that detail...

Re: recurse directory script
by LordAvatar (Acolyte) on Jan 09, 2002 at 19:35 UTC

    Chryst,

    Try removing the print "<A href='@dirs'>@dirs</a><br>"; line. You will slow down performance greatly by printing large directory listings to the screen.

    -Lord Avatar "A simple truth is but a complicated lie..." -Nietzche
THANKS!! Re: recurse directory script
by CHRYSt (Acolyte) on Jan 09, 2002 at 23:12 UTC
    Wow, didn't expect help so quick.
    Using push, plus a slight change in the way it prints each link brought the time down to ~5 mins. instead of print "<A href='@dirs'>@dirs</a><br>"; I used just print "<A href='$_'>$_</A><br>";
    works now. Thanks again. :)
Re: recurse directory script
by Anonymous Monk on Jan 10, 2002 at 00:17 UTC
    You need to dump the library and roll your own. %files is eating up memory since you are storing the data there for a time before using it. Write a breadth first traversal routine and stream the output to the browser without storing more than one line at a time. Use a command pipe with a sorted ls command as the stream. Use counters and -d -f tests to print stats. You can probably find a breadth first traversal on the web.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://137318]
Approved by root
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others making s'mores by the fire in the courtyard of the Monastery: (6)
As of 2024-04-24 12:59 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found