in reply to Bulk hash population order

One should never assume any particular visitation order in a plain perl hash structure. This includes how they're printed, how the each iterator works, how the for will walk the hash, nothing.

The internal structure is well-documented for those who are hacking the interpreter, but really none of the concern of a perl script that uses it. Perl's interpreter is free to rearrange at will, to ensure either memory efficiency or lookup efficiency. Some versions of perl will even hash differently each time the script is run, to thwart attempts to "attack" the interpreter by generating very unbalanced hashes.

There is a specialized version of the hash available, that will keep a second structure around internally to remember what order you inserted things, or how you liked things ordered. This is a subclass, a special case, and useful if your script must have that knowledge or power.

Update: It seems like some people are downvoting, perhaps assuming that I didn't read the question because I'm talking about visitation, and the question talks about bulk insertion. I phrased this intentionally; they're the same thing as far as dealing with a data structure goes. Don't rely on a quirk of the parser to populate hashes in a particular order, either. Just because the ( x => y ) syntax on the right-hand-side is implemented as a list, and today's interpreter happens to walk that list in order to populate the hash, and any duplicate values of x will get written multiple times, does not mean that making these assumptions are a good coding practice. You're inserting into a hash. If you expect a lot of overwrites, and this is important to you, express the appropriate order of insertions to manage these redundant overwrites carefully. While it seems unlikely that the interpreter will break this assumption tomorrow, that's more likely due to the fact that the perl hackers know there are a million poorly written scripts that do make this assumption out there. As a benefit, it makes your code more literate, more self-explanatory and clear.

--
[ e d @ h a l l e y . c c ]

Replies are listed 'Best First'.
Re^2: Bulk hash population order
by jdporter (Paladin) on Nov 27, 2007 at 15:23 UTC

    Maybe it's good advice to say "never make assumptions about ordering wrt hashes", but in this case the advice is misleading, because the form

    %h = ( %h, foo => 2 );
    always* results in the hash %h containing foo => 2, regardless of what value (if any) $h{'foo'} had previously. Fletch's explanation is correct.

    * That is, unless %h is tied to other behavior, e.g. altering or deleting keys or values according to some function.

    A word spoken in Mind will reach its own level, in the objective world, by its own weight
Re^2: Bulk hash population order
by grinder (Bishop) on Nov 27, 2007 at 17:33 UTC
    It seems like some people are downvoting, perhaps assuming that I didn't read the question because I'm talking about visitation, and the question talks about bulk insertion. I phrased this intentionally; they're the same thing as far as dealing with a data structure goes.

    It could be you're being dinged because the information is incorrect.

    The problem has nothing to do with hash traversal. What is happening is hash-to-list flattening (and back again). The keys and values are being flattened into a series of list pairs, and then additional list pairs are being tacked on the end.

    In a subsequent step (the assignment to a hash), the pairs are paired up again, and the results assigned to a hash. The values of keys coming later in the list overwrite the values of keys set earlier in the list. There is no voodoo involved, it's the way list iteration works. It's not an assumption, it's the only way it could ever work. There's no sane algorithm that could replace it.

    List flattening is deterministic, there's no two ways about it (literally :)

    • another intruder with the mooring in the heart of the Perl

Re^2: Bulk hash population order
by JakeIII (Novice) on Nov 27, 2007 at 14:19 UTC
    So, in this simple example ...
    my %lilhash = ( 'A' => 1, 'A' => 2, 'A' => 3 ); print $lilhash{'A'};
    ... you are saying that there's absolutely no guarantee that the code will print ...

    3

    ?
      I'm saying it's a bad assumption to make.

      We know that the perl interpreter will parse this equivalently to:

      my @hiddenlist = ( 'A', 1, 'A', 2, 'A', 3 ); my %lilhash; while (@hiddenlist) { my $hiddenkey = shift @hiddenlist; $lilhash{$hiddenkey} = shift @hiddenlist; } print $lilhash{'A'};
      However, the kinds of assumptions you are making have been made by many folks for years, and it's this kind of assumption that can stymy any interesting optimizations that the interpreter could do with bulk insertions.

      I just think it's a bad idea when the word "order" and the word "hash" come anywhere near each other to start making such assumptions. Regardless of how safe or well-entrenched the idiom may be, my advice is: the hash is unordered and the list is ordered and if you care about order, be explicit.

      --
      [ e d @ h a l l e y . c c ]

        Erm, the list is ordered, and it's the order the keys from that list are used to initialize the hash that's important. If you're going to allow multiple occurrences of any particular key in the initializing data then you're going to have to have a rule for which one "wins" when there are duplicates.

        The only sane rules are going to be "first occurrence wins" or "last occurrence wins" (either of which would allow some variant of the technique in question, so you'd do %hash = ( existing => "new value", %hash ) in bizzaro-perl). The latter of those rules is what perl uses and this is alluded to in perldata, unfortunately in passing rather than more explicitly; see the end of the section "List value constructors":

        Note that just because a hash is initialized in that order doesn't mean that it comes out in that order.

        So yes you shouldn't make any assumptions once it's gone in, but you are guaranteed (weakly, unfortunately) about the order pairs go in from a LIST.

        Update: Not to mention this is the "recommended" way to do default values when you're using a hashref to do named arguments to a sub. See pp185-187 of Perl Best Practices.