comment on

Having spent the best part of an hour chasing down a bug with a hash loop, I feel the need to share my concluding thoughts. Imagine the following fairly innocent sounding beginner question:

I have a hash, and I want a loop that processes every element. What's the best way to do it?

Being experienced, this may prompt one to ask some further questions, such as "Do the elements need to be processed in key order?" and "Is there anything special about the hash, such as is it a tied hash?" - at which point the novice's eyes glaze over at the mention of tied hashes.

The most efficient way to do what is asked is as follows: (benchmarking is left as an exercise for the reader :)


while (my ($key, $value) = each %my_hash) {
   .
   . # some processing on the element
   .
}
[download]

The reason why this is efficient, is that we are using the 'each' function as an iterator. This means that, rather than having to collate a list of all hash members, then process each one, instead, successive calls to 'each' return another member, until the hash is exhausted.

Where can this go wrong?

It is all very well if the code inside the loop is simple. Consider what happens when you reference the hash (either the whole hash or a member of the hash) inside the loop. This has the effect of resetting the placeholder used by 'each'. Disaster: the loop loops forever over the same element!

This can happen several subroutine calls deep, and can be difficult to track down. In my case, I gave up tracking the bug through all the nested calls, and instead rewrote the loop as follows:

for my $key (keys %my_hash) {
   my $value = $my_hash{$key}    # and no need to change the code belo
+w
   .
   . # some processing on the element
   .
}
[download]

This has fixed my bug, and the code works as desired. However, it occurs to me that there is a circumstance when this approach will also fail. This is when the code is insering and/or deleting members from the hash.

Here is a completely safe way to have a loop that involves deletions and/or insertions, without skipping members or losing track of where you are. This code iterates over the original set of hash keys:

for my $key (my @temp = keys %my_hash) {
   my $value = $my_hash{$key}    # and no need to change the code belo
+w
   .
   . # some processing on the element
   .
}
[download]

Since the set of keys is now held in the array @temp, it is now safe to iterate over these values despite the changes in the actual hash keys.

Conclusion:

If you know everything you are doing inside the loop will not interfere with the hash, the 'each' approach is best. Otherwise, if you are not changing keys, use the middle approach. Finally save the keys into a temporary array if you know they could be changing.

In reply to Iterating hashes safely and efficiently by rinceWind

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.