in reply to Re^2: Store a huge amount of data on disk
in thread Store a huge amount of data on disk
Sounds like you're indexing your data by a hex-encoded digest?
Given that you have 3 variable & possible huge sized chunks -- which most RDBMSs handle by writing the filesystem anyway -- associated with each index key, and your selection criteria are both fixed & simple, I'd use the filesystem.
Subdivide the key into chunks that make individual directories contain at most a reasonable number of entries and then store the 3 sections in files at the deepest level.
By splitting a 32-byte hex digest into 4-char chunks, no directory has more than 256 entries. The file-system cache will cache the lower levels and the upper levels will be both fast to read from disk and quick to search. Especially if your file-system hashes its directory entries.
I'd write the individual chunks of the two text parts in separate files unless they will always be loaded as a single entity, in which case it might be slightly faster to concatenate them.
Overall, given a digest of 8fbe7eb8c04c744406cca0aeb67e4f7f, I'd lay the directory structure out like this:
/data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/meta.txt /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.000 /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.001 /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.002 /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text1.... /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text2.000 /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text2.001 /data/8fbe/7eb8/c04c/7444/06cc/a0ae/b67e/4f7f/text2....
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Store a huge amount of data on disk
by Sewi (Friar) on Oct 19, 2011 at 05:13 UTC | |
by BrowserUk (Patriarch) on Oct 19, 2011 at 14:51 UTC | |
by zentara (Cardinal) on Oct 19, 2011 at 16:34 UTC |