in reply to millions of records in a Hash

First let me note that the code that you have given cannot be for your stated problem. For one thing it is a syntax error. For a second you are using the hash values in a DBM as if they were anonymous arrays. That generally won't work due to stringification.

However assuming that your problem as stated is indeed true, I would first check whether the file DB_File is working with has exceeded the limitations of your operating system and or file system for large files. That would generally be likely if the file was about 2 GB. If so then try upgrading your operating system. Current versions of most operating systems should handle files of hundreds of terabytes.

Based on the numbers you have given though, I suspect that you have hit a different limit. If your machine is 32-bit (if it is on Intel hardware then it almost assuredly is) then I would wonder whether somewhere within DB_File it keeps track of a pointer itself, and that limits the size of file it can address. (Berkeley DB itself has no such size limit so it would be in the interface.) My first shot would be to try the newer BerkeleyDB and then report the bug to Paul Marquess, with a short program that produces the bad data set on your system, along with my guess as to the problem. (If there is a difference in behaviour, be sure to tell him that as well.) Only do this if you are not hitting a limit at 2 GB. He knows all about the 2 GB limit, it isn't his bug, and the only thing he can tell you is to upgrade.

If you are hitting large file limits, you can still get around them but it will be slower. What you need to do is sit down with perltie and figure out how to write your own tied interface. For instance you could have 4 dbms on disk, and use ord($key) & 3 to figure out which one a given key/value pair was going into. Now since each one is only getting 1/4 of the data, none of them will hit the size issues.

Replies are listed 'Best First'.
(crazyinsomniac) Re: Re (tilly) 1: millions of records in a Hash
by crazyinsomniac (Prior) on Feb 25, 2002 at 07:59 UTC