di has asked for the wisdom of the Perl Monks concerning the following question:
I am working with a large text of about 6.5 MB, the words of which will be indexed to the paragraphs in which they occur. Search on a word through a browser interface will return the paragraphs. My question is what might be the optimum number of files in which to store the text from which the paragraphs will be extracted. The text would naturally lend itself to storage in 1, 4, 197, or 1628 files.
Returns could be a few or hundreds - even thousands. My guess is that a few returns would be best (most quickly) extracted from a few small files, whereas a large number of returns would be better extracted from one large file. Is this correct? What are the relative impacts of number and size of files on access speed? What are the criteria for balancing them. Should I simply seek the middle way? Are there other factors I should consider?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Balancing number of files against size of files in optimizing access speed
by GrandFather (Saint) on Jan 09, 2010 at 21:46 UTC | |
|
Re: Balancing number of files against size of files in optimizing access speed
by sflitman (Hermit) on Jan 09, 2010 at 22:18 UTC |