You already know that hash lookups have an O(1) order of growth in computational complexity as the hash grows. Collisions aren't that big of a deal; the chains, even for gigantic hashes, are kept pretty small. Rehashing is a more significant problem, but that only takes place as the hash grows, not once it has arrived at its finished state.

Maybe if we knew more about the nature of your "big hash" data set we might find ways of pre-processing or re-structuring it for faster searches. But in the "general" case, it's hard to beat what you've got already.

One generalized solution would be to use workers: share your hash by creating it first, and then forking worker processes -- possibly one or two per processor core. As long as they never write to the original data, the "copy on write" mechanism won't be engaged, and they will share the data set. Let them keep track of their own tallies, and when they're done, let them report back to the parent script who can reduce the work each worker provided back down to a single final answer.

There may exist less general, but more optimal solutions for your specific data set, but we would need to have a better understanding of that data set before we could include or eliminate such strategies.


Dave


In reply to Re^3: Small Hash a Gateway to Large Hash? by davido
in thread Small Hash a Gateway to Large Hash? by lsherwood

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.