First let me note that the code that you have given cannot be for your stated problem. For one thing it is a syntax error. For a second you are using the hash values in a DBM as if they were anonymous arrays. That generally won't work due to stringification.

However assuming that your problem as stated is indeed true, I would first check whether the file DB_File is working with has exceeded the limitations of your operating system and or file system for large files. That would generally be likely if the file was about 2 GB. If so then try upgrading your operating system. Current versions of most operating systems should handle files of hundreds of terabytes.

Based on the numbers you have given though, I suspect that you have hit a different limit. If your machine is 32-bit (if it is on Intel hardware then it almost assuredly is) then I would wonder whether somewhere within DB_File it keeps track of a pointer itself, and that limits the size of file it can address. (Berkeley DB itself has no such size limit so it would be in the interface.) My first shot would be to try the newer BerkeleyDB and then report the bug to Paul Marquess, with a short program that produces the bad data set on your system, along with my guess as to the problem. (If there is a difference in behaviour, be sure to tell him that as well.) Only do this if you are not hitting a limit at 2 GB. He knows all about the 2 GB limit, it isn't his bug, and the only thing he can tell you is to upgrade.

If you are hitting large file limits, you can still get around them but it will be slower. What you need to do is sit down with perltie and figure out how to write your own tied interface. For instance you could have 4 dbms on disk, and use ord($key) & 3 to figure out which one a given key/value pair was going into. Now since each one is only getting 1/4 of the data, none of them will hit the size issues.


In reply to Re (tilly) 1: millions of records in a Hash by tilly
in thread millions of records in a Hash by johnkj

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.