We have an "application" (which I have no control over at all) that essentially dumps up to 150 word docs a day into an FTP server.

Part of the doc contains a "status" field which can be a combination of a couple of numbers.

Our support desk has always changed the numbers of these fields to change the doc into a new status. They were completing this by using an ftp client, and renaming the doc online. OR, if there are multiple directories to search, typically they would hit the "root" share and search multiple directories for the doc, then make the change there. However, the application guys have asked that logging start being done on the changes or they will take away this functionality. So, no problem, they came to me and asked me to write up a web based form to make the changes to the documents. Which I have done.

The problem lies in the searching. Via an ftp client (to a specific folder) it was very quick. Via the share method (multiple dirs) it was slower, but still relatively fast. Now, they are using a pre-existing web based search that is extremely slow.

Feeling sorry for these guys (I was in that area) I took it upon myself to create a faster search than the one they were using (the original was not a perlish solution). The problem is, I cannot get this thing any faster. 44 seconds on average to search the entire tree, obviously less for smaller amounts of folders.

So, after all that rambling, my question to you is HOW? I've thought about starting an index via fork and Storable as soon as the client loads the search form, but
a) I'm not sure if fork plays nice with web pages?
b) It's still 45 seconds for the full index. I can't get the data from Storable finished before the client hits submit.

I've also thought about running a seperate process on the webserver which will index once every five minutes or so, but
a) This will cause unnecessary load on the server, and
b) will not be realtime.
Five minutes is not a long time for the difference, and from experience, I think it would work out OK (as the end-user normally does not call for a status change for several minutes after the doc is created), but there is still the outside possibility that this will not suffice.

Any more ideas?
If it matters, the server that my script is running on is IIS 5.0 on NT 4.

In reply to Faster searching by the_slycer

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.