We have an "application" (which I have no control over at all) that essentially dumps up to 150 word docs a day into an FTP server.
Part of the doc contains a "status" field which can be a combination of a couple of numbers.
Our support desk has always changed the numbers of these fields to change the doc into a new status. They were completing this by using an ftp client, and renaming the doc online. OR, if there are multiple directories to search, typically they would hit the "root" share and search multiple directories for the doc, then make the change there. However, the application guys have asked that logging start being done on the changes or they will take away this functionality. So, no problem, they came to me and asked me to write up a web based form to make the changes to the documents. Which I have done.
The problem lies in the searching. Via an ftp client (to a specific folder) it was very quick. Via the share method (multiple dirs) it was slower, but still relatively fast. Now, they are using a pre-existing web based search that is extremely slow.
Feeling sorry for these guys (I was in that area) I took it upon myself to create a faster search than the one they were using (the original was not a perlish solution). The problem is, I cannot get this thing any faster. 44 seconds on average to search the entire tree, obviously less for smaller amounts of folders.
So, after all that rambling, my question to you is HOW? I've thought about starting an index via fork and
Storable as soon as the client loads the search form, but
a) I'm not sure if fork plays nice with web pages?
b) It's still 45 seconds for the full index. I can't get the data from Storable finished before the client hits submit.
I've also thought about running a seperate process on the webserver which will index once every five minutes or so, but
a) This will cause unnecessary load on the server, and
b) will not be realtime.
Five minutes is not a long time for the difference, and from experience, I think it would work out OK (as the end-user normally does not call for a status change for several minutes after the doc is created), but there is still the outside possibility that this will not suffice.
Any more ideas?
If it matters, the server that my script is running on is IIS 5.0 on NT 4.