mischief has asked for the wisdom of the Perl Monks concerning the following question:

In perlfaq4, there is a question "What happens if I add or remove keys from a hash while iterating over it?" with the answer "Don't do that".

Why not?

  • Comment on Adding or removing keys while iterating over a hash

Replies are listed 'Best First'.
Re: Adding or removing keys while iterating over a hash
by mischief (Hermit) on Nov 28, 2000 at 15:27 UTC
    OK, I've just found out the answer. Apparently, according to the perl 5.6 version perlfaq4:

    lwall In Perl 4, you were not allowed to modify a hash at all while
    iterating over it.  In Perl 5 you can delete from it, but you still
    can't add to it, because that might cause a doubling of the hash table,
    in which half the entries get copied up to the new top half of the
    table, at which point you've totally bamboozled the iterator code.
    Even if the table doesn't double, there's no telling whether your new
    entry will be inserted before or after the current iterator position.
    
    Either treasure up your changes and make them after the iterator finishes,
    or use keys to fetch all the old keys at once, and iterate over the list
    of keys.
    
    I guess perlmonk's docs must be from 5.005_3. I checked on perl.com as well beforehand, but they have the same answer. Does that mean that perl 5.6 isn't considered authoritative yet?
      It's odd that it's still so restrictive, because I looked into this pretty carefully and found that although insertion in the middle of an each loop can (and does) cause big problems, there is no way that deletion can cause trouble.

      This is for two reasons. First, unlike insertion, a deletion never causes a complete reorganization of the entire hash. (This is mentioned in the part of the FAQ that you quoted.) But there's also something more subtle going on.

      The way a hash works is that it's a collection of lists of he items. Each he has a pointer to a key, to the associated value, and a pointer to the next he in the list. Every hash also keeps track of the 'current' he. When you call each, it advances the 'current item' pointer to the next he. It can do this because each he points at the next one.

      Normally, when you call delete, the memory for the deleted he is reclaimed right away. But if you delete the 'current item', there's a complication. If the he were destroyed and the memory were reclaimed right away, then the next time you called each, it wouldn't work, because it needs to find the 'next' item, and the information would have been stored in the he that you destroyed.

      So there is special-case code inside the delete function that specifically checks to see if you are deleting the 'current item', and if so, it postpones the destruction of the he until after the next time you call each. There's more special-case code inside of each that checks to see if you already used delete on the current item, and if so, it belatedly destroys the he after advancing the 'current item'.

      In other words, Larry went to some special effort to make sure that you could delete nodes inside of an each loop, so it's particularly odd that this isn't documented.

      In recognition of this, the Perl 5.6.1 the manual will include the following:

      Exception: It is always safe to delete the item most recently returned by each(), which means that the following code will work:
      while (($key, $value) = each %hash) { print $key, "\n"; delete $hash{$key}; # This is safe }
      But I still don't know why arbitrary deletions are warned against, since they do always work.

      guess perlmonk's docs must be from 5.005_3. I checked on perl.com as well beforehand, but they have the same answer. Does that mean that perl 5.6 isn't considered authoritative yet?
      Most of us with real machines to run are holding at 5.5.3, waiting patiently for 5.6.1 to be released in a few more weeks (as they've kept saying for three months now {grin}), having considered 5.6.0 as only for "early adopters". And of course 5.7.0 was "over the bleeding edge".

      -- Randal L. Schwartz, Perl hacker

        Why are you using 5.5.3 instead of 5.6.0? You say that 5.6.0 is for "early adopters", but personally I don't know any reasons for not using the later versions (not to say that there aren't any reasons; I'm just ignorant of what they are). I'm by no means a perl guru, so as far as I can see 5.6.0 just has some nice features that I would like to use. All the modules I want to use are compatible with 5.6.0, and there are a few little things that I have been using in my scripts like our(), use warnings and delete()ing array elements. None of these are really essential to what I'm doing of course, but they're nice.

        I'm quite happy to accept that 5.6.0 isn't ready for use on production machines, but could someone tell me why, or where to find out why? Thanks in advance for any pointers.

        (Apologies for the late reply, but I haven't really been in a position to until now.)

Re: Adding or removing keys while iterating over a hash
by dws (Chancellor) on Nov 28, 2000 at 23:02 UTC
    The other reason not to do it is that you're liable to confuse the heck out of whoever picks up the code next. The remove case is easy to understand, but it's not at all clear to the average reader (who understands that hashes are unordered) what it means to do insert a new key in the middle of an iteration. "Do I see the key again later in the iteration? Uh... Uh... We'll have to run it and see."

    I hate being backed into "we'll have to run it and see" mode, and try not do to that to others.