Have you done any benchmarking experiments comparing hash table implementations? You know! Experiments with lots of unique additions. Experiments that involve lots of collisions (hash key already exists). Deletion operations. Conditional deletion operations etc.

In general, any descent hash table implementation worthy of the name, is effectively (amortised) O(1) for all those operations. But of course, the devil is in the detail. As general purpose, anything you like as a key and anything you like as a payload, Perl's implementation is (probably) about as good as it gets. It main downside is that it's performance and generality comes at the cost of substantial memory usage. A worthwhile trade when speed and generality are the major criteria.

Can Perl's hash implementation be beat? Could it be optimised?

It absolutely depends upon the usage you are making of them; and for what criteria you are trying to optimise.

My current effort have 3 aims: 1) to trade (a little) speed for significant memory reduction, by reducing the size of the internal structures used; 2) further reduce memory consumption by restricting keys to 255 characters thereby allowing me to get away with a single byte for key length; 3) avoid the need to use memory and cpu to sort the keys, by implementing the hash such that it supports in order traversal of the keys. All, whilst retaining the generality of key containing any byte values & supporting any SV as a payload. I'm currently bogged down in the typical (for me) XS memory & refcount management issues.

What about on disk Hash tables using MLDBM? Could this be made to run any faster? How does it compare with using a database equivalent (e.g. MySQL)?

I don't think performance was a high criteria for the implementation of MLDBM. I think they were aiming for a) basic CRUD behaviour (no referential constraints or normal form requirements); b) with nested hash-like ease of use (avoiding SQL complexities and bottlenecks); c) whilst retaining ACID reliability in a multi-user environment, with single-file simplicity.

And I think they've got pretty close to those goals. To target speed would be to compromise their other goals.

The bottom line is that it is nearly always possible to beat out an RDBMS for performance when you have specific knowledge of the data and usage requirements. But doing it requires custom effort for each specific set of requirements. RDBMSs (and Perl's hashes) are aimed at performing well across the widest possible range of general requirements.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re^3: Perl's hash table implementation by BrowserUk
in thread Perl's hash table implementation by jc

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.