rminner has asked for the wisdom of the Perl Monks concerning the following question:

#
=begin perlmonkcomment
Today i ran into a small problem:

I wrote a small helper function which gets me some data from a hash.
I am not accessing the hash directly, as i do not know the exact name of the key so i am forced to do some pattern matching on all hashkeys (well, at least until i get a match).

Strangely it sometimes found the element in the hash and sometimes it didn't, even though the info was always present. It took me some time to find out, that the iterator for that hash wasn't reseted properly.

Quoting the Perl FAQ:

How do I reset an each() operation part-way through?

> Using keys %hash in scalar context returns the number of keys in the hash and resets the iterator associated with the hash. You may need to do this if you use last to exit a loop early so that when you re-enter it, the hash iterator has been reset.

What confused me was, that i have to do this resetting every time i enter my subroutine. (This Problem also ocurrs if i exit the loop using a return statement instead of last).

The thing is i tend to use those while ( my ($x ,$y) = each %hash) constructs quite frequently to iterate over all elements of a hash. Unfortunately this behaviour would force me to manually reset the hash iterator through the use of &keys every time i use this construct, in order to be really sure that i am truly iterating over all elements.

Now my question is: is it always necessary to reset the iterator before using each, or is there a trick where i won't have to call keys everytime before using it (as i said, i use it to iterate over all elements).

Demo Programm for the problem: (not my real function, but a simplified example)
=cut $::WORKAROUND = 0; sub find_in { my ($data_ref, $string) = @_; my $result = 'no result'; # reset the iterator for %$data_ref: keys %$data_ref if ($::WORKAROUND); while ( my ($name , $value) = each %$data_ref) { if (lc($name) eq lc($string)) { $result = $value; last; } } return $result; } sub look_for_data { my %hash = ( A => 'xxx' , B => 'xxx' , C => 'xxx' , D => 'xxx'); print "D: " . find_in(\%hash , 'D') . "\n"; print "A: " . find_in(\%hash , 'A') . "\n"; print "B: " . find_in(\%hash , 'B') . "\n"; print "B: " . find_in(\%hash , 'B') . "\n"; } print "# NO workaround:\n"; look_for_data(); print "# WITH workaround:\n"; $::WORKAROUND=1; look_for_data(); =begin perlmonk_comment
Output of the Programm:
# NO workaround: D: xxx A: no result B: xxx B: no result # WITH workaround: D: xxx A: xxx B: xxx B: xxx =cut #
UPDATE: As pointed out by jhedden my problem is already covered in the thread The Anomalous each()

Replies are listed 'Best First'.
Re: when do i know that the iterator for a hash was reseted
by japhy (Canon) on Apr 20, 2006 at 12:34 UTC
    Short answer: you should always reset the iterator before you use each on a hash in a function.

    Long answer: I think that you should always reset the hash iterator before you use each() on a hash in a function.


    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart
Re: when do i know that the iterator for a hash was reseted
by Hue-Bond (Priest) on Apr 20, 2006 at 12:49 UTC

    I see no need to use each in this case:

    sub find_in { my ($data_ref, $string) = @_; my $result = 'no result'; my ($key) = grep /^$string$/i, keys %$data_ref; $result = $$data_ref{$key}; return $result; }

    Update: condensed and with a modified data set:

    sub find_in { my ($data_ref, $string) = @_; return $$data_ref{ (grep /^$string$/i, keys %$data_ref)[0] || '' } + || 'no result'; } sub look_for_data { my %hash = ( A => 'axxx' , B => 'bxxx' , C => 'cxxx' , D => 'dxxx' +); print "d: " . find_in(\%hash , 'd') . "\n"; print "A: " . find_in(\%hash , 'A') . "\n"; print "B: " . find_in(\%hash , 'B') . "\n"; print "Z: " . find_in(\%hash , 'Z') . "\n"; } look_for_data(); __END__ d: dxxx A: axxx B: bxxx Z: no result

    Update 2: Added ^ and $ anchors as per jhourcle advice.

    Update 3: Same again.

    --
    David Serrano

      Be careful with that regex:

      (lc($name) eq lc($string)) and ($name =~ /$string/i) are not the same, as the regex will match any occurance of $string within $name (ie, substrings of $name, as well).

      For the original poster -- have you considered using a case insensitive hash? See Tie::CPHash

        thanks for the hint, concerning Tie::CPHash, might save me some work in the future.
      Thanks for your answer.
      actually i think there is never a _real_ need to use each, as you can always do it using keys. The thing is i like to use each, because deeply nested hashes tend to get a bit more readable through the use of each (imo).
      But i guess in some cases where i use while each, your code might be a good inspiration.
      By what you and japhy posted, it seems to me that the only reliable way of getting all elements is keys. As with each you will "never know".
      I find it somehow strange that while (my ($key , $val) = each %hash) is the recommended way (perlfaq) for getting all elements from a hash as the behaviour can get so quirky rather quick.
Re: when do i know that the iterator for a hash was reseted
by jdhedden (Deacon) on Apr 20, 2006 at 13:13 UTC
      oops. thanks and sorry. didn't find it when i looked for it in the archive.
Re: when do i know that the iterator for a hash was reseted
by ff (Hermit) on Apr 21, 2006 at 04:30 UTC
    Why use an iterator when you can go straight to the horse's mouth? Try exists on the string and if you miss, try exists on a lower-case version of the string?

    use strict; sub find_inB { my ($data_hr, $string) = @_; my $no_result = 'no result'; return ( exists $$data_hr{ $string } ? $$data_hr{ $string } : exists $$data_hr{ lc $string } ? $$data_hr{ lc $string } : $no_result ); } sub look_for_dataB { my %hash = map { ($_, 'xxx'); } qw( A B C D ); print "D: " . find_inB(\%hash , 'D') . "\n"; print "A: " . find_inB(\%hash , 'A') . "\n"; print "B: " . find_inB(\%hash , 'B') . "\n"; print "B: " . find_inB(\%hash , 'B') . "\n"; } print "# look_for_dataB:\n"; look_for_dataB(); __END__ # look_for_dataB: D: xxx A: xxx B: xxx B: xxx
      well - for example this doesn't help if i don't know the case of the key-string, as i won't be trying all possible permutation of a String sTring stRing strIng STrING with an exists statement ...

      Also it doesn't help me, when my hashkey is of type "What_I_am_looking_for Irrelevant_String", then exists also won't help me, as i don't necessarily know what the string Irrelevant_String is.

      My Problem isn't that i am not finding the data in the hash, i was primarily confused by the behaviour of each which was caused by my lack of knowledge of the way it behaves.

      (Posted it in part, in case sbd else has the same problem, so he can find the answer on the net).
        True, though I was keying off your line which merely required

        if (lc($name) eq lc($string)) {
        I'd say the thrust is still valid, though, assuming you are the one constructing the hash in the first place. That is, why not, at the point you create the hash key, normalize it to lowercase with:

        my %hash; my $next_key = 'What_I_am_looking_for Irrelevant_String'; my $data_for_next_key = 'xxx'; $hash{ lc $next_key } = [ ($next_key, $data_for_next_key) ]; print "Value associated with '$next_key' is '", $hash{ lc $next_key }->[1], # or $hash{ lc $next_key }[1] "'\n";
        and now a test for exists would only even have to test for a lower-case version of the string_in_question. If your hash is at all sizeable, I'd think that using a little extra RAM to store the original key would be beneficial: you could avoid iterating through the keys every time you need to do this check.

        This assumes that you don't have duplicate entries because of case, i.e. it will have to be okay that $hash{ aBc } clobbers $hash{ ABC }.