Valkerri has asked for the wisdom of the Perl Monks concerning the following question:

What is a reasonable limit for keys in a hash? (A quick search on PerlMonks for "hash + limit" did not turn up the answer I was looking for.)

PROBLEM: I am maintaining an customer database. Unfortunately, the ISP/Host has repeatedly promised to install MySQL but has yet to do so. Until that happens, I am maintaining the database in flat files.

To update these files, I was planning to:

What do you think, in theory? Will this work? How many records can I expect to work with at a time?

Thank you in advance for your help.

Replies are listed 'Best First'.
Re: Limit to Hash Keys
by perrin (Chancellor) on Jan 11, 2002 at 21:34 UTC
    Use a dbm rather than a flat file. A good dbm can handle terabytes of data. The other thing that sounds suspect in your plan is loading the e-mail data into an in-memory hash, which obviously is limited by the amount of free memory you have. You might have to load it in chunks if you don't have enough RAM.
Re: Limit to Hash Keys
by jwest (Friar) on Jan 11, 2002 at 21:36 UTC
    I've done this several times, to my dismay, working with several hundreds of thousands of entries with little problem- save for an enormous amount of RAM taken.

    A cleaner way to do this, if you're intrested, might be to craft a Tie::Hash object that will interface with your database for you, letting you access it without loading it all at once.

    Depending on the format of your database, you might have better options as well, including using a DBI driver such as DBD::csv.

    --jwest

    -><- -><- -><- -><- -><-
    All things are Perfect
        To every last Flaw
        And bound in accord
             With Eris's Law
     - HBT; The Book of Advice, 1:7
    
Re: Limit to Hash Keys
by count0 (Friar) on Jan 11, 2002 at 21:33 UTC
    The number of keys to your hash is mainly limited only by the system's resources (namely the memory).

    Perhaps a possible solution might be tying hashes. Or using dbm files.
Re: Limit to Hash Keys
by Kanji (Parson) on Jan 11, 2002 at 21:49 UTC

    You might want to look at DBD::AnyData (alt.) and/or some of the other flat-file DBI drivers.

    That way, you can use SQL to compare new data against current as you would with MySQL, and -- if your ISP ever does get around to installing it -- migrating your code could be as trivial as changing the DSN.

        --k.


Re: Limit to Hash Keys
by rje (Deacon) on Jan 11, 2002 at 22:50 UTC
    Here's my practical experience from work. Our scripts usually will glob in large amounts of data, and sort through them via hashtables and such. Such code frequently consumes 100% of the CPU for long periods of time, which indicates that we may be reading in too much data.

    I have found, on our 500- and 700-mhz machines, that 50 megabytes of data is too much for hashtables. The operating system may start churning all that data to a disk cache, in which case you may be in for loooong processing times. Or, for truly vast amounts of data, perhaps the program will simply die due to an Out Of Memory Error.

    On the other hand it appears that 1 megabyte of data is not too bad for us. It seems to go into hashtables ok. Of course, we have 256 meg of RAM, so 1 meg here and there is "no big deal".

    Your mileage may differ.

    rje