in reply to using values and keys functions

Other people have answered your main question. Here are answers to your peripheral questions.

To understand the answers, you need to know how a hash works internally. Internally a hash has a set of buckets. There is a function (aka a hash function) that decides what bucket each key should go into. Ideally the assignment of keys to buckets will look random, so if you have enough buckets for your keys, then no bucket has very many keys. But in fact it is deterministic. That means that inserting/retrieving/deleting are always fast, because you only have to work with the handful of keys in a bucket. (Technical note, Perl changes the number of buckets if the hash gets too many keys, thereby keeping the number of keys/bucket down. This operation is known as a "hash split" and is expensive. But it is also rare, and the cost of this operation averages out to a constant per insert. Perl does not try to reclaim memory if a hash shrinks after having grown.)

  1. i thought the ordering of keys/values are or will be random No. Perl walks the buckets in order, and for each bucket walks the contents in order. Since Perl does this the same way for both keys and values, the order will match between them.
  2. my understanding is that the order of keys and values being returned is affected by insertion order. Yes. The assignment of keys to buckets is not affected by insertion order, but the order of keys in buckets can be. (OK, I lied there. It is possible in at least some versions of Perl for the order that keys are added to cause a split to have happened/not happened.)
  3. i also thought that soon if not already the order returned is further randomized for some sort of security thing. Yes. In recent versions of Perl, the hashing function that is used changes every time you run Perl. This is to prevent people from sending you carefully constructed data that causes your keys to all go into one bucket. Since they can't know what hashing function you're using, they have no way to construct a malicious dataset except by accident (and the odds against it are high).
  4. ...will the following really work? Yes. That is because Perl actually runs values first to generate a list, then keys to generate the list of variables, then proceeds to assign the one to the other.

Replies are listed 'Best First'.
Re^2: using values and keys functions
by monarch (Priest) on Jun 09, 2005 at 07:25 UTC
    I've wondered about the security aspects of changing the hashing function..

    Whilst I acknowledge that possibly a security risk is posed from the order of items retrieved from a hash.. I can't actually think of any practical areas where randomising the hash function actually assists.

    Surely if someone is putting together a hash that is at risk of attack then they should filter the data somehow?

    Wouldn't a more fixed hashing function be of greater benefit.. are there any programmers today who take advantage of the order in which the hash is output consistantly across executions?

    note this is just a musing.. not an actual Perl change-request..

    update: Thanks for the replies below, very informative!

      You might find this page interesting. Look for the links from it ( Dominus' reference, and another to a post here Hash Clash on purpose by iburrell).


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
      "Science is about questioning the status quo. Questioning authority".
      The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      BrowserUk answered the security question.

      Note that filtering against this attack is virtually impossible, without extensive analysis you won't know what could possibly be a problem, and it could affect any hash at all that gets lots of data. Hashes are documented to be fast, and it is Perl's job to make them work out that way.

      As for people relying on the order from the hash, I'd consider breaking that to mostly be a benefit. Anyone who relied on hash order being consistent was guaranteeing that their code would break when you change versions of Perl. (Perl's hash function changed fairly frequently, though admittedly not as often as it does now.) With the new change, people catch their mistake earlier. A real example of this mistake that I believe bit Ovid was a poorly written test that assumed the order in which keys came back out from a hash.

      Though, admittedly, it did cause a few problems for people who would compare whether they got the same hash that they had previously by using Storable to stringify the hash, and then did a string compare with the old result. However you can fix that by setting $Storable::canonical to a true value.