Hashes, by their very nature, are often full of empty pointers, each of which is bound to be at least 4 bytes on 32-bit platforms. If your elements are only 30 bytes each, then a large amount of this memory is probably sitting in unused hash-slots. There's nothing much you can do about this, but if you are running short on RAM, you can use a "tied" hash, as described below.

Even with lists, the Perl interpreter must store information besides the data itself, such as the length of the scalar, reference counts, and so forth, which normally doesn't add up to a whole lot. I would expect that a minimum of 12-16 bytes to be allocated per scalar just for this kind of internal information. Of course, with several million tiny scalars being allocated, this can add up to a lot of data.

In compiled languages like C, when you ask for an array of 1,000,000 30-byte strings, that's what you get, usually as a big continuous chunk of RAM. With Perl, unless you want to do something really crazy, like pack your strings into one giant scalar and use substr to extract them, you have to live with the overhead. Usually it's not so bad, but it might come as a bit of a surprise.

Since you have about 30 MB of real data, you are experiencing about a 1:3 expansion when loaded into RAM. If this is a problem, you can use a "tied" hash which uses far less RAM, but is disk-based and a fair bit slower. Still, tied hashes and regular hashes work the same way. It only requires you to add a few lines to tie the hash, and the rest of your code can stay the same.

In reply to Re: List overhead by tadman
in thread List overhead by malloc

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.