The Anomalous each()

I was alerted to an anomaly that can occur when using each(), and I thought it might be useful to pass it along.

Consider the following code:

#!/usr/bin/perl
use strict;
use warnings;

my %data = (
    'baz' => 'zab',
    'bar' => 'rab',
    'foo' => 'oof',
    'key' => 'yek',
    'drop_out' => '- do a last -',
);

while (my ($key, $val) = each(%data)) {
    last if ($key eq 'drop_out');
    print("$key => $val\n");
}

print("-- Exited the loop.  Let's try it again...\n");

while (my ($key, $val) = each(%data)) {
    last if ($key eq 'die');
    print("$key => $val\n");
}

print("-- Done\n");
exit(0);
[download]

At first glance, you'd think the first loop would print out a few elements, and then exit. Then the second loop would start over processing the hash, and do the same. However, the output from this is:

    bar => rab
    baz => zab
    -- Exited the loop.  Let's try it again...
    foo => oof
    key => yek
    -- Done

"Holy nacreous arthropod, Batman! Is that some sort of bug in Perl?"

"Well, it certainly has a maleficent odor about it, Boy Wonder, but let's RTFM first."

From perlfunc under each we read:

There is a single iterator for each hash, shared by all "each", "keys", and "values" function calls in the program; it can be reset by reading all the elements from the hash, or by evaluating "keys HASH" or "values HASH".

The key points are that there is a single iterator, and that it is (only) reset after all the elements have been read. The anomaly is that the iterator is not reset when you leave the scope containing the each()!

This can lead to bugs, as the following illustrates:

#!/usr/bin/perl
use strict;
use warnings;

my %data = (
    'key' => 'yek',
    'foo' => 'oof',
    'bar' => 'rab',
    'baz' => 'zab',
    'die' => '- DOOM -',
);

eval { do_it(\%data); };

print("-- Opps, I died!  Try again...\n");

do_it(\%data);
exit(0);

###

sub do_it
{
    my $data = $_[0];

    while (my ($key, $val) = each(%{$data})) {
        die if ($key eq 'die');
        print("$key => $val\n");
    }
    print("-- Did it\n");
}
[download]

The above produces the following output:

    bar => rab
    baz => zab
    foo => oof
    -- Opps, I died!  Try again...
    key => yek
    -- Did it

Even when we die(), totally exit the scope of the subroutine, and then restart it, the each() is not reset.

Lessons Learned

The fist example teaches us: Watch out when using each() if the loop code can exit.
From the second example we learn: Be doubly cautious if the loop can be exited, and then later re-executed!

To me, this is definitely a feature that looks like a bug. So you probably won't be seeing me using each() except in really simple cases where I can be total sure it's safe, or for really huge hashes where keys() would not be suitable.

Update: Let me rephrase. I'm not saying it IS a bug. I'm trying to convey that this feature gives me the same feeling that a real bug would. I.e., something to be avoided or worked around. Or as demerphq so aptly puts it below: It has code smell.

Also, I realize that there are venerable monks that might read this, and say to themselves, "Well, I could have told you that." This post was not meant for you. It was meant for the others, like myself, that even though we may have read the each() perlfunc entry, we didn't quite catch the full implications and potential consequences of Perl's single iterator implementation.

Remember: There's always one more bug.

Comment on The Anomalous each() Select or Download Code

Replies are listed 'Best First'.
Re: The Anomalous each() by demerphq (Chancellor) on Nov 18, 2005 at 21:07 UTC
Whatever the other comments I'm glad you posted this. Its a mistake that I've observed even experienced monks make. As a general rule I consider each() to be a code smell. Any time I see it I investigate closer to see exactly what is going on. --- $world=~s/war/peace/g	[reply]
Re: The Anomalous each() by diotalevi (Canon) on Nov 18, 2005 at 19:52 UTC
No, you've reached an incorrect conclusion. each() works just fine and you don't have to care about how some previous loop exited and whether it reached the end or not. If you call values() or keys() in void context just prior to your new loop, you've reset the iterator to the beginning. void context keys()/values() are extremely cheap operations. Just do it. `keys %data; while ( ... = each %data ) { ... }` [download] Its dumb that a person has to know to use a call to keys() to reset the iterator. I don't like that at all.	[reply] [d/l]
Re^2: The Anomalous each() by fletcher_the_dog (Friar) on Nov 18, 2005 at 20:13 UTC
But you do have to worry if you for some reason call 'each' or 'keys' on a hash in an inner loop of an each loop. There is not really any way around this expect to use 'keys' in your outer loop instead of 'each'. `# this code creates an endless loop my %hash = ( foo=>"x", Foo=>"y", Bar=>"a", bar=>"c" ); while(my($key,$value)=each %hash){ # dumb example for my $key2 (keys %hash){ if(lc($key) eq lc($key2)){ print "key $key is similiar to $key2\n"; } } }` [download]	[reply] [d/l]
Re^3: The Anomalous each() by diotalevi (Canon) on Nov 18, 2005 at 20:15 UTC
Ok, so don't give your hash to functions you don't control if you think that's going to be a problem.	[reply]
Re^4: The Anomalous each() by jdhedden (Deacon) on Nov 18, 2005 at 20:39 UTC
Re^5: The Anomalous each() by Fairy Nuff (Novice) on Nov 18, 2005 at 20:43 UTC
Re^2: The Anomalous each() by jdhedden (Deacon) on Nov 18, 2005 at 20:19 UTC
Gee, so you're saying that using keys() in a void context is a workaround for this bug-like behavior. Your reply would seem to reinforce my conclusion. Further, from a code maintainability aspect, the use of keys() would definitely appear to be superfulous, and another maintainer might be inclined to remove it. I'll go with using a foreach loop instead: `foreach my $key (keys(%hash)) { my $val = $hash{$key}; ... }` [download] when my threashhold of comfort for using each() is not met. Remember: There's always one more bug.	[reply] [d/l]
Re^3: The Anomalous each() by diotalevi (Canon) on Nov 18, 2005 at 20:28 UTC
Its not a work-around so much as it is how you've always been supposed to use it. Its not a bug because its designed that way. Or at least isn't called a bug. Its pretty dumb behaviour and it makes it easy to make mistakes. The only time I'd ever use each() is on hashes which might have large numbers of keys. It'd be potentially wasteful or fatal to do a loop over keys(%hash) on an especially large hash. When that's possible, then you just have to bite the bullet, use each() and do it safely.	[reply]
Re^4: The Anomalous each() by jdhedden (Deacon) on Nov 18, 2005 at 20:54 UTC
Re: The Anomalous each() by Aristotle (Chancellor) on Nov 19, 2005 at 04:47 UTC
The same sort of thing will happen if you use `while( /match/g ) { ... }` and abort the loop and then execute a similar loop, or when you use another match against the same string within the loop body. Or if you use `while( <$fh> ) { ... }` and abort the loop and then execute a similar loop, or when you fiddle the filehandle within the loop body. Or… I don’t find each to be any more surprising than those cases. The actual difference is that these other cases offer a way to save and restore the iterator position (pos, tell/seek), whereas each does not, so it is harder to work with. Makeshifts last the longest.	[reply] [d/l] [select]
Re^2: The Anomalous each() by Tanktalus (Canon) on Nov 19, 2005 at 05:59 UTC
Just thinking out loud here with some untested code... `sub each_iter(%) { my $hash = shift; my @keys = keys %$hash; sub { my $nkey = shift @keys; $nkey => $hash->{$nkey}; } } my $iter = each_iter(%some_hash) while (my ($k, $v) = $iter->()) { # do stuff }` [download] This kind of reminds me of Writing Solid Code's rewrite of realloc. Any use of realloc would be "code smell" as well, so Steve Maguire diligently set out to prove it, then rewrite it safely. If it stinks so bad, what can we do to provide a safer version? The above is probably as good as it gets without XS code. Perhaps if someone has a Hash::Util module to stick this into... Anyway, just because a language comes with a function doesn't mean it's always the best. At the time, each was probably quite nifty. So was realloc. Hindsight is 20/20, and now we probably know better. Which is why, as I recall, perl 6 will do this just a wee bit differently. Update: I agree with Aristotle that it's not even close to ideal. However, I'm not sure it's "not very helpful". For small hashes (for some definition of small which may depend on free memory), it can indeed be incredibly helpful: you can run the each iterator inside itself - i.e., two loops that absolutely must run inside each other. You can't do this without taking a copy of the keys anyway, so this provides the syntactical sugar to do it the each way that you want. It also provide a small-change to existing code (create the iterator on the previous line, change the each call to `$iter->()`) which you want to get working quickly. It means you can get going again quickly. Personally, I almost never use each to begin with - probably from this exact same problem. So it's not as helpful to me, but I could see reasons to use this function. Besides - I put it out there more to try to prompt someone with the XS skills to try it than to propose it as a "good" solution".	[reply] [d/l] [select]
Re^3: The Anomalous each() by Aristotle (Chancellor) on Nov 19, 2005 at 06:28 UTC
There is a Hash::Util, which is core in 5.8. There also is Hash::MoreUtils (although it doesn’t look like nearly as useful an addition as its cousin List::MoreUtils is). I agree that having an iterator maker function handle this is the better idea, but doing it in pure Perl is not very helpful. The point of each is to avoid building a huge list – f.ex., when you’re dealing with very large DBM files, you really don’t want to use keys on them. The code you show does not achieve this. I can’t think of any case in which I’d prefer it over a straight `foreach( keys %hash )` (not even to pass the iterator around – I can just as well pass an array of keys around). If anyone has the XS chops to take this on, it would be good idea. Makeshifts last the longest.	[reply] [d/l]
Re^3: The Anomalous each() by Aristotle (Chancellor) on Nov 19, 2005 at 11:10 UTC
For small hashes (for some definition of small which may depend on free memory), it can indeed be incredibly helpful: you can run the each iterator inside itself – i.e., two loops that absolutely must run inside each other. But you can do that just as well with two nested `foreach( keys %hash )` loops. There is nothing you can do with your function that cannot be done more straightforwardly with direct use of keys. each is necessary only when memory is a concern; your solution does not provide for that. Makeshifts last the longest.	[reply] [d/l]
Re^4: The Anomalous each() by Tanktalus (Canon) on Nov 19, 2005 at 15:49 UTC
Re^5: The Anomalous each() by Aristotle (Chancellor) on Nov 19, 2005 at 16:13 UTC
Re: The Anomalous each() by hossman (Prior) on Nov 18, 2005 at 20:45 UTC
This behavior isn't that suprising if you remember that `each` isn't returning you an iterator which is magically affected by the operations on previously used iterators -- or can magically affect subsequent iterators. `each` is advancing the existing iterator contained in the internals of the hash you are working with -- that iterator is shared by everyone who has a refrence to that hash, in the same way the data inside the hash is shared by everyone with a refrence to it. Put another way: If perl looked more like java, then i'd agree with you that this is a bug... `my $iter = %data.iter; while (my ($key, $val) = each($iter)) { last if ($key eq 'drop_out'); print("$key => $val\n"); }` [download] ...but as it stands, this behavior is no more suprising to me then if you were looping over the keys to a hash, and modifying the values -- and then outside of the loop you discovered that the hash had been changed.	[reply] [d/l] [select]
Re: The Anomalous each() by bart (Canon) on Nov 19, 2005 at 10:55 UTC
The key points are that there is a single iterator, and that it is (only) reset after all the elements have been read. The anomaly is that the iterator is not reset when you leave the scope containing the each()! Huh? each has no scope, it is global. Maybe it'd indeed be nice if indeed it respected subs as a scope. But, like you said: it doesn't. But I never use each. Behaviour like you described, which is inherent in the model, is just too low level to my taste.	[reply]
Re: The Anomalous each() by Perl Mouse (Chaplain) on Nov 20, 2005 at 17:03 UTC
It's not a bug (nor an anomaly). It's intended, useful, behaviour. I use `each` all the time, and I've never been bitten by it. It's no different from `<STDIN`, `m//g`, or `readdir`. That's what iterators are supposed to do. And I think it's bad advice to say it's something to be avoided or worked around. `Perl --((8:>*`	[reply] [d/l] [select]