We have about 200,000 files floating around in various subdirectories on a network server. To solve the problem of finding where someone has moved individual files, I created a simple cgi to search for a filename (or part of a filename) and when found, output a link to it. Obviously, this would be extremely slow if I actively searched for the current location of the file, so instead I search a delimited text file which has all the filenames and paths. I generate this text file every so often, usually at the end of the day. This text file is currently 31M in size.

But.. the problem I'm now running into is speed. So many people have started using the interface that it's getting rather slow. To make matters worse, we expect another 200,000 files or so to be dumped onto the server soon. I have been thinking about changing the interface slightly so that people only search subsets of the entire filebase. I've also thought that a DBM hash might be faster to search than a text file, but I'm not sure of this.

Would a DBM hash improve efficiency? Any other ideas? Thanks for any suggestions. Here is the little bit of code that I use to search the text file:

sub lookup { my $infile = "locs.txt"; my $href; $table = "<TABLE CELLPADDING=10><TR><TD><B>Path<TD><B>Size<TD><B>Mod +ified</TR>"; open(F, "+< $infile"); while (<F>) { ($filename, $cms, $path, $size, $day, $time) = split /,/, $_; if (index($filename, $to_find) > -1) { $href = "file:\\\\netd\\data".$path."\\$filename"; $href =~ s/\s/%20/g; $table .= "<TR><TD><A HREF=\"$href\">$path\\$filename</A><TD>$si +ze<TD>$day $time</TR>"; } } close(F); $table .= "</TABLE>"; $table =~ s/"//g; }

In reply to I need speed by Galen

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.