JakeIII has asked for the wisdom of the Perl Monks concerning the following question:

So, I create a big hash full of lots of default values. (Imagine a big hash anyway)
my %bighash = ( 'A' => 1, 'B' => 2, 'C' => 3 );
... and for some reason or other, I need to change a value or perhaps even add a completely new one ...
$bighash{'C'} = 'some new value'; $bighash{'D'} = 4;
... which is no big deal when its isn't huge. Anyway, will the following ALWAYS do what the casual observer might expect (ie. replace the values of %bighash with the new ones)? Or is it possible to some versions of perl on some machines may sometimes repopulate %bighash backwards or in no particular order resulting in the changed keys to possibly be changed?
%bighash = ( %bighash, 'C' => 'some new value', 'D' => 4 );
Or is there an even simpler, better, more obvious way to "bulk" append a hash and save a few keystrokes that I'm clearly missing? This technique seems to work, but I suppose that I'm looking for a communal blessing of sorts.

Jake

Replies are listed 'Best First'.
Re: Bulk hash population order
by perrin (Chancellor) on Nov 27, 2007 at 14:14 UTC
    Another cute way to do this is a hash-slice:
    @bighash{'C', 'D'} = ('some new value', 4);
    I think it's more efficient, but I haven't tested it.
      Yes, a hash slice (or a loop) is the way to go, imho.

      Assuming you have the new values you want in %newbits:

      @bighash{keys %newbits} = values %newbits;
      or
      $bighash{$_} = $newbits{$_} for keys %newbits;
      Note that keys and values use the same order, so the above works.
Re: Bulk hash population order
by Fletch (Bishop) on Nov 27, 2007 at 13:56 UTC

    Yes, it will work. A hash in list context evaluates to a list of key/value pairs (in an arbitrary order), and initializing a hash from a list of key/value pairs will overwrite earlier instances of duplicate keys with the last one seen.

Re: Bulk hash population order
by halley (Prior) on Nov 27, 2007 at 13:55 UTC
    One should never assume any particular visitation order in a plain perl hash structure. This includes how they're printed, how the each iterator works, how the for will walk the hash, nothing.

    The internal structure is well-documented for those who are hacking the interpreter, but really none of the concern of a perl script that uses it. Perl's interpreter is free to rearrange at will, to ensure either memory efficiency or lookup efficiency. Some versions of perl will even hash differently each time the script is run, to thwart attempts to "attack" the interpreter by generating very unbalanced hashes.

    There is a specialized version of the hash available, that will keep a second structure around internally to remember what order you inserted things, or how you liked things ordered. This is a subclass, a special case, and useful if your script must have that knowledge or power.

    Update: It seems like some people are downvoting, perhaps assuming that I didn't read the question because I'm talking about visitation, and the question talks about bulk insertion. I phrased this intentionally; they're the same thing as far as dealing with a data structure goes. Don't rely on a quirk of the parser to populate hashes in a particular order, either. Just because the ( x => y ) syntax on the right-hand-side is implemented as a list, and today's interpreter happens to walk that list in order to populate the hash, and any duplicate values of x will get written multiple times, does not mean that making these assumptions are a good coding practice. You're inserting into a hash. If you expect a lot of overwrites, and this is important to you, express the appropriate order of insertions to manage these redundant overwrites carefully. While it seems unlikely that the interpreter will break this assumption tomorrow, that's more likely due to the fact that the perl hackers know there are a million poorly written scripts that do make this assumption out there. As a benefit, it makes your code more literate, more self-explanatory and clear.

    --
    [ e d @ h a l l e y . c c ]

      Maybe it's good advice to say "never make assumptions about ordering wrt hashes", but in this case the advice is misleading, because the form

      %h = ( %h, foo => 2 );
      always* results in the hash %h containing foo => 2, regardless of what value (if any) $h{'foo'} had previously. Fletch's explanation is correct.

      * That is, unless %h is tied to other behavior, e.g. altering or deleting keys or values according to some function.

      A word spoken in Mind will reach its own level, in the objective world, by its own weight
      It seems like some people are downvoting, perhaps assuming that I didn't read the question because I'm talking about visitation, and the question talks about bulk insertion. I phrased this intentionally; they're the same thing as far as dealing with a data structure goes.

      It could be you're being dinged because the information is incorrect.

      The problem has nothing to do with hash traversal. What is happening is hash-to-list flattening (and back again). The keys and values are being flattened into a series of list pairs, and then additional list pairs are being tacked on the end.

      In a subsequent step (the assignment to a hash), the pairs are paired up again, and the results assigned to a hash. The values of keys coming later in the list overwrite the values of keys set earlier in the list. There is no voodoo involved, it's the way list iteration works. It's not an assumption, it's the only way it could ever work. There's no sane algorithm that could replace it.

      List flattening is deterministic, there's no two ways about it (literally :)

      • another intruder with the mooring in the heart of the Perl

      So, in this simple example ...
      my %lilhash = ( 'A' => 1, 'A' => 2, 'A' => 3 ); print $lilhash{'A'};
      ... you are saying that there's absolutely no guarantee that the code will print ...

      3

      ?
        I'm saying it's a bad assumption to make.

        We know that the perl interpreter will parse this equivalently to:

        my @hiddenlist = ( 'A', 1, 'A', 2, 'A', 3 ); my %lilhash; while (@hiddenlist) { my $hiddenkey = shift @hiddenlist; $lilhash{$hiddenkey} = shift @hiddenlist; } print $lilhash{'A'};
        However, the kinds of assumptions you are making have been made by many folks for years, and it's this kind of assumption that can stymy any interesting optimizations that the interpreter could do with bulk insertions.

        I just think it's a bad idea when the word "order" and the word "hash" come anywhere near each other to start making such assumptions. Regardless of how safe or well-entrenched the idiom may be, my advice is: the hash is unordered and the list is ordered and if you care about order, be explicit.

        --
        [ e d @ h a l l e y . c c ]

Re: Bulk hash population order
by duelafn (Parson) on Nov 27, 2007 at 17:07 UTC

    One can even make it pretty with a simple prototyped function:

    sub hpush(\%@) { my ($h,$k,$v) = (shift); $$h{$k} = $v while ($k, $v) = splice(@_, 0, 2); } # ... later my %bighash = ( A => 1, B => 2, C => 3 ); hpush %bighash, B => 3, D => 4;

    Good Day,
        Dean

Re: Bulk hash population order
by localfilmmaker (Novice) on Nov 28, 2007 at 07:44 UTC
    With all the posts here you've pretty much got your answer if this will work or not. But I would also suggest making it readable. Make your code obvious what it is doing by making your variables self-explanatory and by commenting your code.
    my %defaults = get_defaults(); my %bighash = get_big_hash(); # Set default values to our bighash %bighash = (%bighash, %defaults);
    The point here is to make that assignment statement as simple to read as possible so there is no confusion about what you are doing there and why.
Re: Bulk hash population order
by locked_user sundialsvc4 (Abbot) on Nov 28, 2007 at 20:17 UTC

    Amen! Make it clear!

    If you want to change the values in a hash, use hash notation consistently throughout ... precisely so that Perl in its DWIM-crazed way does not decide that you “meant” something that you didn't even know you were doing.

    And then, once you know you have a hash, and that no goofy hash-to-list-to-hash magic is happening behind your back, always assume that the keys will be retrieved “in no particular order.” A hash is intended to be used as a high-speed, but random-access data structure.

Re: Bulk hash population order
by toolic (Bishop) on Nov 27, 2007 at 13:55 UTC
    Update: what was I thinking!

    It seems simpler just to omit %bighash from the right-hand side of the assignment:

    %bighash = ( 'C' => 'some new value', 'D' => 4 );

      Assigning to a whole hash (%hash = LIST) clobbers the existing contents; what he's trying to do is append and/or overwrite new contents. Your code has now clobbered the rest of the contents of %bighash and it only has those two keys rather than replacing just the old pairs for the keys C and D.