I am working with a large text of about 6.5 MB, the words of which will be indexed to the paragraphs in which they occur. Search on a word through a browser interface will return the paragraphs. My question is what might be the optimum number of files in which to store the text from which the paragraphs will be extracted. The text would naturally lend itself to storage in 1, 4, 197, or 1628 files.
Returns could be a few or hundreds - even thousands. My guess is that a few returns would be best (most quickly) extracted from a few small files, whereas a large number of returns would be better extracted from one large file. Is this correct? What are the relative impacts of number and size of files on access speed? What are the criteria for balancing them. Should I simply seek the middle way? Are there other factors I should consider?
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.