Hm. So you have 1e10 * 1e3 * 200 * 4-byte indices = 8 PetaBytes of indices pointing to 1e3 * 1e6 * (sizeof "small hash") = 1 billion small hashes. And you wanted to use Storable to load this from disk? You won't even be able to hold the 'smaller' dataset in memory, let alone 5% of the larger unless you have some pretty special hardware available,

On x64 linux, you'll find that you have a 256 GB maximum memory size which means each of your 1 billion small hashes would have to be less than 274 bytes including overhead. Which given that %a = 'a'..'z'; requires 1383 bytes on x64, means they would have to be very small to fit into a fully populated x64 machine, even if you exclude any memory used by the OS.

And the largest single file (ext3) is 16 Terabytes, so you would have to split your large dataset across 500 files minimum--assuming you could find a NASD device capable of handling 8 petabytes.

Your only reasonable way forward (using commodity hardware) is to a) stop over estimating your growth potential; b) partition your dataset in usable subsets.

I was under the impression that the largest genome was the human at 3e9?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re^14: Storing large data structures on disk by BrowserUk
in thread Storing large data structures on disk by roibrodo

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.