in reply to Searching many files

Out of curiousity, are you OCRing the images from perl (you mentioned TIFFs) or is that background and you have the emails in text form? Also, would it make sense to store all of the emails in a TEXT column in a database, and then use LIKE to find emails with the words / combination of words? Also, what about multithreading? This kind of problem would definitely work in parrallel, since each step doesn't depend on a step before it. Could you spread the work across several computers to speed it up? (i.e. each computer takes 1/nth of the data set)? Just some thoughts nobody's mentioned. Other posters ideas (namely indexing the e-mails) would work extremely well too.

Want to support the EFF and FSF buy buying cool stuff? Click here.