Hello dear esteemed PerlMonks,

I am using a tied SDBM file to store a large volume of data and I get the following error after about 20 minutes of running time:

sdbm store returned -1, errno 22, key "0677202576" at -e line 11, <> line 15830957.

As it can be seen, the keys for my hash are just ten-digit phone numbers, so thie problem has nothing to do with the key being too long.

It seems more likely that I have hit some physical limit regarding the size of the DBM library. The size of the DBM file after the failure is:

-rwxr-xr-x 1 prod dqd 262144 2013-03-15 10:23 DBM_DOS.dir -rwxr-xr-x 1 prod dqd 2147429376 2013-03-15 10:23 DBM_DOS.pag

The file size 2147429376 is pretty close to 2^31, which may be a physical limit for the underlying C libraries.

The platform is running on AIX 6.1.6.15.

.

Please note that I have also tried dbmopen, NDBM and ODBM, with a similar problem, I am not able to load all my data (the input data has 30.4 million records) with any of these.

I have found another way of doing what I was needing but it would still be very nice if I could use tied hashes on such volumes of data. Does anyone of you know any recipe or workaround to make it possible with tied hashes?

As a side note, this problem had led me to develop a file comparison script in which I read in parallel two sorted files, A and B, and extract data into three output files: records that are both in A and B, records that are only in A and records that are only in B. I looked for modules doing this type of file comparison and there does not seem to be any (or they use hashes, which make them unusable for large data sets). Since this is something I am doing regularly, I have put this utility in a module that I can easily reuse. The question is: in your opinion, would it be useful to make this module available on the CPAN? I am asking the question because it would require quite a bit of additional work on my part to make more general-purpose than it currently is, and and would not want to do this additional work unless it can be really useful to other people (not to speak of other things that would have to be done, like providing a test suite and installation procedures and scripts, which I have absolutely no clue on how to do it).


In reply to Problems with SDBM by Laurent_R

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.