Re: The Anomalous each()
by demerphq (Chancellor) on Nov 18, 2005 at 21:07 UTC
|
Whatever the other comments I'm glad you posted this. Its a mistake that I've observed even experienced monks make. As a general rule I consider each() to be a code smell. Any time I see it I investigate closer to see exactly what is going on.
---
$world=~s/war/peace/g
| [reply] |
Re: The Anomalous each()
by diotalevi (Canon) on Nov 18, 2005 at 19:52 UTC
|
No, you've reached an incorrect conclusion. each() works just fine and you don't have to care about how some previous loop exited and whether it reached the end or not. If you call values() or keys() in void context just prior to your new loop, you've reset the iterator to the beginning. void context keys()/values() are extremely cheap operations. Just do it.
keys %data;
while ( ... = each %data ) {
...
}
Its dumb that a person has to know to use a call to keys() to reset the iterator. I don't like that at all. | [reply] [d/l] |
|
|
But you do have to worry if you for some reason call 'each' or 'keys' on a hash in an inner loop of an each loop. There is not really any way around this expect to use 'keys' in your outer loop instead of 'each'.
# this code creates an endless loop
my %hash = (
foo=>"x",
Foo=>"y",
Bar=>"a",
bar=>"c"
);
while(my($key,$value)=each %hash){
# dumb example
for my $key2 (keys %hash){
if(lc($key) eq lc($key2)){
print "key $key is similiar to $key2\n";
}
}
}
| [reply] [d/l] |
|
|
Ok, so don't give your hash to functions you don't control if you think that's going to be a problem.
| [reply] |
|
|
|
|
|
|
Gee, so you're saying that using keys() in a void context is a workaround for this bug-like behavior. Your reply would seem to reinforce my conclusion.
Further, from a code maintainability aspect, the use of keys() would definitely appear to be superfulous, and another maintainer might be inclined to remove it.
I'll go with using a foreach loop instead:
foreach my $key (keys(%hash)) {
my $val = $hash{$key};
...
}
when my threashhold of comfort for using each() is not met.
Remember: There's always one more bug.
| [reply] [d/l] |
|
|
Its not a work-around so much as it is how you've always been supposed to use it. Its not a bug because its designed that way. Or at least isn't *called* a bug. Its pretty dumb behaviour and it makes it easy to make mistakes. The only time I'd ever use each() is on hashes which might have large numbers of keys. It'd be potentially wasteful or fatal to do a loop over keys(%hash) on an especially large hash. When that's possible, then you just have to bite the bullet, use each() and do it safely.
| [reply] |
|
|
Re: The Anomalous each()
by Aristotle (Chancellor) on Nov 19, 2005 at 04:47 UTC
|
The same sort of thing will happen if you use while( /match/g ) { ... } and abort the loop and then execute a similar loop, or when you use another match against the same string within the loop body.
Or if you use while( <$fh> ) { ... } and abort the loop and then execute a similar loop, or when you fiddle the filehandle within the loop body.
Or…
I don’t find each to be any more surprising than those cases.
The actual difference is that these other cases offer a way to save and restore the iterator position (pos, tell/seek), whereas each does not, so it is harder to work with.
Makeshifts last the longest.
| [reply] [d/l] [select] |
|
|
sub each_iter(%)
{
my $hash = shift;
my @keys = keys %$hash;
sub {
my $nkey = shift @keys;
$nkey => $hash->{$nkey};
}
}
my $iter = each_iter(%some_hash)
while (my ($k, $v) = $iter->())
{
# do stuff
}
This kind of reminds me of Writing Solid Code's rewrite of realloc. Any use of realloc would be "code smell" as well, so Steve Maguire diligently set out to prove it, then rewrite it safely. If it stinks so bad, what can we do to provide a safer version? The above is probably as good as it gets without XS code. Perhaps if someone has a Hash::Util module to stick this into... Anyway, just because a language comes with a function doesn't mean it's always the best. At the time, each was probably quite nifty. So was realloc. Hindsight is 20/20, and now we probably know better. Which is why, as I recall, perl 6 will do this just a wee bit differently.
Update: I agree with Aristotle that it's not even close to ideal. However, I'm not sure it's "not very helpful". For small hashes (for some definition of small which may depend on free memory), it can indeed be incredibly helpful: you can run the each iterator inside itself - i.e., two loops that absolutely must run inside each other. You can't do this without taking a copy of the keys anyway, so this provides the syntactical sugar to do it the each way that you want. It also provide a small-change to existing code (create the iterator on the previous line, change the each call to $iter->()) which you want to get working quickly. It means you can get going again quickly. Personally, I almost never use each to begin with - probably from this exact same problem. So it's not as helpful to me, but I could see reasons to use this function.
Besides - I put it out there more to try to prompt someone with the XS skills to try it than to propose it as a "good" solution". | [reply] [d/l] [select] |
|
|
There is a Hash::Util, which is core in 5.8. There also is Hash::MoreUtils (although it doesn’t look like nearly as useful an addition as its cousin List::MoreUtils is).
I agree that having an iterator maker function handle this is the better idea, but doing it in pure Perl is not very helpful. The point of each is to avoid building a huge list – f.ex., when you’re dealing with very large DBM files, you really don’t want to use keys on them.
The code you show does not achieve this. I can’t think of any case in which I’d prefer it over a straight foreach( keys %hash ) (not even to pass the iterator around – I can just as well pass an array of keys around).
If anyone has the XS chops to take this on, it would be good idea.
Makeshifts last the longest.
| [reply] [d/l] |
|
|
For small hashes (for some definition of small which may depend on free memory), it can indeed be incredibly helpful: you can run the each iterator inside itself – i.e., two loops that absolutely must run inside each other.
But you can do that just as well with two nested foreach( keys %hash ) loops. There is nothing you can do with your function that cannot be done more straightforwardly with direct use of keys. each is necessary only when memory is a concern; your solution does not provide for that.
Makeshifts last the longest.
| [reply] [d/l] |
|
|
|
|
Re: The Anomalous each()
by hossman (Prior) on Nov 18, 2005 at 20:45 UTC
|
This behavior isn't that suprising if you remember that each isn't returning you an iterator which is magically affected by the operations on previously used iterators -- or can magically affect subsequent iterators.
each is advancing the existing iterator contained in the internals of the hash you are working with -- that iterator is shared by everyone who has a refrence to that hash, in the same way the data inside the hash is shared by everyone with a refrence to it.
Put another way: If perl looked more like java, then i'd agree with you that this is a bug...
my $iter = %data.iter;
while (my ($key, $val) = each($iter)) {
last if ($key eq 'drop_out');
print("$key => $val\n");
}
...but as it stands, this behavior is no more suprising to me then if you were looping over the keys to a hash, and modifying the values -- and then outside of the loop you discovered that the hash had been changed.
| [reply] [d/l] [select] |
Re: The Anomalous each()
by bart (Canon) on Nov 19, 2005 at 10:55 UTC
|
The key points are that there is a single iterator, and that it is (only) reset after all the elements have been read. The anomaly is that the iterator is not reset when you leave the scope containing the each()!
Huh? each has no scope, it is global. Maybe it'd indeed be nice if indeed it respected subs as a scope. But, like you said: it doesn't.
But I never use each. Behaviour like you described, which is inherent in the model, is just too low level to my taste.
| [reply] |
Re: The Anomalous each()
by Perl Mouse (Chaplain) on Nov 20, 2005 at 17:03 UTC
|
It's not a bug (nor an anomaly). It's intended, useful, behaviour. I use each all the time, and I've never been bitten by it. It's no different from <STDIN, m//g, or readdir. That's what iterators are supposed to do. And I think it's bad advice to say it's something to be avoided or worked around.
| [reply] [d/l] [select] |