OK, I've just found out the answer. Apparently, according to the perl 5.6 version perlfaq4:
lwall In Perl 4, you were not allowed to modify a hash at all while
iterating over it. In Perl 5 you can delete from it, but you still
can't add to it, because that might cause a doubling of the hash table,
in which half the entries get copied up to the new top half of the
table, at which point you've totally bamboozled the iterator code.
Even if the table doesn't double, there's no telling whether your new
entry will be inserted before or after the current iterator position.
Either treasure up your changes and make them after the iterator finishes,
or use keys to fetch all the old keys at once, and iterate over the list
of keys.
I guess perlmonk's docs must be from 5.005_3. I checked on perl.com as well beforehand, but they have the same answer. Does that mean that perl 5.6 isn't considered authoritative yet? | [reply] |
It's odd that it's still so restrictive, because
I looked into this pretty carefully and found that
although insertion in the middle of an each
loop can (and does) cause big problems, there is no way that
deletion can cause trouble.
This is for two reasons. First, unlike insertion,
a deletion never causes a complete reorganization
of the entire hash. (This is mentioned in the part of
the FAQ that you quoted.) But there's also something
more subtle going on.
The way a hash works is that it's a collection of lists of
he items. Each he has a pointer to a
key, to the associated value, and a pointer to the next he
in the list. Every hash also keeps track of the 'current' he.
When you call each, it advances the 'current item'
pointer to the next he. It can do this because
each he points at the next one.
Normally, when you call delete, the memory
for the deleted he is reclaimed right away. But
if you delete the 'current item', there's a complication.
If the he were destroyed and the memory
were reclaimed right away, then the next time you
called each, it wouldn't work, because
it needs to find the 'next' item, and the information
would have been stored in the he that you
destroyed.
So there is special-case code inside the delete
function that specifically checks to
see if you are deleting the 'current item', and if so, it
postpones the destruction of the he until after
the next time you call each. There's more
special-case code inside of each that checks to
see if you already used delete on the current item, and
if so, it belatedly destroys the he after
advancing the 'current item'.
In other words, Larry went to some special effort to make sure that you
could delete nodes inside of an each loop,
so it's particularly odd that this isn't documented.
In recognition of this, the
Perl 5.6.1 the manual will include the following:
Exception: It is always safe to delete the item most recently
returned by each(), which means that the following code will work:
while (($key, $value) = each %hash) {
print $key, "\n";
delete $hash{$key}; # This is safe
}
But I still don't know why arbitrary deletions are
warned against, since they do always work.
| [reply] [d/l] |
| [reply] |
Why are you using 5.5.3 instead of 5.6.0? You say that 5.6.0 is for "early adopters", but personally I don't know any reasons for not using the later versions (not to say that there aren't any reasons; I'm just ignorant of what they are). I'm by no means a perl guru, so as far as I can see 5.6.0 just has some nice features that I would like to use. All the modules I want to use are compatible with 5.6.0, and there are a few little things that I have been using in my scripts like our(), use warnings and delete()ing array elements. None of these are really essential to what I'm doing of course, but they're nice.
I'm quite happy to accept that 5.6.0 isn't ready for use on production machines, but could someone tell me why, or where to find out why? Thanks in advance for any pointers.
(Apologies for the late reply, but I haven't really been in a position to until now.)
| [reply] |
The other reason not to do it is that you're liable to confuse the heck out of whoever picks up the code next. The remove case is easy to understand, but it's not at all clear to the average reader (who understands that hashes are unordered) what it means to do insert a new key in the middle of an iteration. "Do I see the key again later in the iteration? Uh... Uh... We'll have to run it and see."
I hate being backed into "we'll have to run it and see" mode, and try not do to that to others.
| [reply] |