Howdy

I have a basic conceptual question about Perl's implementation of hashes. First let me give the problem background.

I have two tables in my database a{id, acc} and b{build, acc}. I almost always need to have {id, build} tuples and because my tables are fairly large (5 million rows each, roughly) I would like to create a c{id, build} table. Note that there an acc is not necessarily found in either a or b -- each table contains some not found in the other.

A straight SQL join statement is running horribly slow on my DB and our DBA can offer no suggestions for tuning.

As a work-around, I was thinking of retrieving all of table b into a hash: $data{acc} = build. The underlying hash relationship of one build per acc is guaranteed to be valid.

So, the question is, how can I determine how much memory this will take? Naively I would figure that it would require bytes(acc) + bytes(build) which in my case is 60. At that point I could just do a (large) number of hash-lookups, which should be relatively quick. I would write to file, then bulk-load the data and build indexes.

Only... 60 bytes for 5 million records appears to be about 300 MB. That's easily do-able but it doesn't take into account overhead.

Aside from the caveat that doing this in the DB properly ought to be faster, is my approach reasonable and will memory usage be anywhere close to 300 MB (I have about 2 GB available for this)?


In reply to Space Efficiency of Hashes by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.