Re: Searching many files

Another way to attack this would be to split the data into a couple of files (or at lease seperate out some of the data) so the user can search by subject, from or to addresses as well as by delivery date. (just for kicks, I would change the date to a seconds since epoch format for range sorting.

This is a sort of indexing, but will minimize the raw volume of data to be searched.

A point of concern: what character did the ocr insert if there was a recognition failure? Could this invalidate full string searches?

this does point to a need for an index of words and emails that use them. This would allow the results to be scored as to the odds that is is a hit.

one way would be to create a file named for each word found in an email. append the email name to the file containing the word. Then push the next email against the previous list to build a directory that would contain files named for each word used in the emails, each file containing the email(or tiff) names that used the words. each file could then be sorted uniquely.

This set up gives yo ua great deal of flexibility on presentation, as well as search abilities.

or search for a module to do this work for you, the only thing a module would be missing is the ability to handle missed OCR reads.

Good Luck!
dageek

Comment on Re: Searching many files