Another way to attack this would be to split the data into a couple of files (or at lease seperate out some of the data) so the user can search by subject, from or to addresses as well as by delivery date. (just for kicks, I would change the date to a seconds since epoch format for range sorting.
This is a sort of indexing, but will minimize the raw volume of data to be searched.
A point of concern: what character did the ocr insert if there was a recognition failure? Could this invalidate full string searches?
this does point to a need for an index of words and emails that use them. This would allow the results to be scored as to the odds that is is a hit.
one way would be to create a file named for each word found in an email. append the email name to the file containing the word. Then push the next email against the previous list to build a directory that would contain files named for each word used in the emails, each file containing the email(or tiff) names that used the words. each file could then be sorted uniquely.
This set up gives yo ua great deal of flexibility on presentation, as well as search abilities.
or search for a module to do this work for you, the only thing a module would be missing is the ability to handle missed OCR reads.
Good Luck!
dageek
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.