orthanc has asked for the wisdom of the Perl Monks concerning the following question:

Hi All,

I've been working a lot with hashes recently and as you all know you can't get stuff out in insertion order unless you use Tie::IxHash.

I know you will all probably say you should use arrays if you want a certain order, but what if you want the hash functionality?

The main point of this question being if perl uses its own indexing method to store the hash internally, why not just have it tied internally? Does it make a difference?

Any suggestions?

Orthanc

Replies are listed 'Best First'.
Re: Hash Internals
by japhy (Canon) on Oct 19, 2000 at 16:13 UTC
    The ordered hash structure uses three things:
    • a hash of key-index pairs
    • an array of keys
    • an array of values
    Thus:
    $orderedhash{$newkey} = $newvalue; # is really doing $next_index = @KEYS; $KEYINDEX{$key} = $next_index; push @KEYS, $key; push @VALUES, $value; $orderedhash{$oldkey} = $newvalue; # is really doing $key_index = $KEYINDEX{$key}; $VALUES[$key_index] = $value;
    Deletions require splicing the two arrays, and then lowering the values in the %KEYINDEX hash. And if you want to allow for in-the-middle insertion, that requires splicing again, and raising the values in the hash.

    This is obviously a slow structure and a slow process. It does not make for optimization. Hashes are designed so that keys can be looked up quickly, and Perl's hash structure might change soon to make Perl realize that a non-existent key isn't in the hash sooner (that's good).

    $_="goto+F.print+chop;\n=yhpaj";F1:eval
      what is wrong with an implementation that just uses a hash and a list to keep the order of keys? FETCH-ing should be faster (a single hash lookup); STORE-ing \ would be faster (only replace the hash value), KEYS would be the same, VALUEs would be a little slower, EACH also slightly slower...

      i guess the big question is whether given an existing key-value, does replacing that key's value keep its original position or does it move to the 'end'?

        That's ok for a limited design, where you'll only be adding to the front or back.

        $_="goto+F.print+chop;\n=yhpaj";F1:eval
Re: Hash Internals
by orthanc (Monk) on Oct 19, 2000 at 16:30 UTC

    So is efficiency the main reason why the hashes are not tied internally?

    And if so, does Tie::IxHash slow things down a lot? and i'm asking this question assuming that the algorithm mentioned above is similar to the method used in the Tie::IxHash.

      Yes, the method I described is VERY similar to Tie::IxHash, if not the same. I wrote my own implementation to see if I understood it (and did a presentation on it at YAPC 19100. Ordered hashes are slow because you totally disregard the work the hashing algorithm does to make the accesses fast. And because the setup uses those three data structures, and there's not one that maps key to value, you have to first use the hash to get key to index, and then use the array of values to get index to value. Ick.

      $_="goto+F.print+chop;\n=yhpaj";F1:eval