Many times a database table fits nicely in a list of lists (2 dimensional array). Other times a hash works better. I worked on a project where list of lists was chosen after extensive testing.

Questions to think about for hash versus array:

Is your data rectangular?
Regular database tables without NULLs nicely fits a rectangular data structures. Irregular data fits better in hashes.

What sort of iteration will you need to do?
If you will know the key to find your data element, a hash is much better than searching over a list. If you need to do a comparison on each key anyway, an array might be easier to search. For example, if you are checking each key to see if it matches a regular expression, an array can be better. (Let me know if you want to know why :-).

Are the keys of the hash simple to implement?
If you have one database field that is the key, it is easy to use it as a hash key. If the database rows are keyed with multiple columns, the hash gets more complicated since you will need to combine columns to make the hash key.

Do you need to keep reloading your data structures from the database, or are they static?
If they are static, you can use a few tricks that save memory. If you have a machine with shared libraries and copy-on-write virtual memory, you can get multiple modperl http daemons to share the database data.

You can presize arrays so that they have less memory overhead. For measuring memory consumption, the normal tools such as ps and top will work reasonably well.

To see if high-level behavior such as copy-on-write is working properly, you need to stress test your server to see how much traffic load causes it to swap. You really need to use a development server for this type of testing.

Writing a stress-test program is fun! Create a program using LWP to simulate the behavior of a single user. Run a bunch of these programs at the same time to simulate the load caused by many users. You can get typical user behavior patterns by examining your server log files. This approach allows you make impressive claims, such as "This system is scaled for two second page load times with 1000 simultaneous users."

It should work perfectly the first time! - toma


In reply to Re: Comparing memory requirements for hashes and arrays by toma
in thread Comparing memory requirements for hashes and arrays by pmas

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.