Ojosh!ro has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks,

I was wondering are there rules for the strings I use for hash keys and if so which. So far I haven't encountered any, but I hope to prevent future problems.
Searches haven't really turned up anything either but I could very well have tried the wrong words.

Replies are listed 'Best First'.
Re: Hash key string restrictions?
by Joost (Canon) on Feb 10, 2007 at 00:05 UTC
    The only rule is that hash keys are always strings while normal perl scalars can be a number of types; strings, references, objects (blessed references), integers and floats.

    That means you can for example use objects as keys but you cannot get an object out of a key; instead you will get the "stringified" version of the object - and the hash won't see the difference between a stringified object and the object itself.

    There is also the subtle issue that the taint flag isn't kept for hash keys; in other words if you use a tainted value as a hash key and then read the key, the new value will be untainted.

    The best way to think about this is that hash keys are always "pure" strings with no additional data whatsoever.

    None of this applies to hash values by the way. They're regular old perl scalars.

Re: Hash key string restrictions?
by dragonchild (Archbishop) on Feb 10, 2007 at 01:21 UTC
    The important thing here is that variables used as keys are stringified. So, objects will be stringified. Also, undef will be stringified, as demonstrated:
    my %x; my $u = undef; my $s = ""; $x{$u} = 1; $x{$s} = 2; print "$x{$u}\n";
    These aren't limitations on what can be used, but consequences of using certain values.

    My criteria for good software:
    1. Does it work?
    2. Can someone else come in, make a change, and be reasonably certain no bugs were introduced?
Re: Hash key string restrictions?
by liverpole (Monsignor) on Feb 10, 2007 at 00:24 UTC
    Hi Ojosh!ro,

    That is most definitely a unique question!

    As ikegami and Joost have said, any string will work.

    Having said that, though, there is one exception:  an empty string will NOT work, but if you should try to use one, you'll get a syntax error (even in the absence of strict and warnings).  (Update: as Limbic~Region correctly advises, even an empty string is legal.  My test case was apparently in error.)

    I thought any string would be legal, but I wanted to be certain, so I wrote the following test program for fun:

    # Strict use strict; use warnings; # Libraries; use Data::Dumper; # Main program $| = 1; my $phash = { }; my $parray = [ ]; my $ndiff = 0; # Save all single characters and character pairs as hash keys for (my $i = 0; $i < 256; $i++) { save_to_hash_and_array($i); for (my $j = 0; $j < 256; $j++) { save_to_hash_and_array($i, $j); } } # Test all single characters for (my $i = 0; $i < 256; $i++) { my $is_same = hash_matches_array($i)? 1: 0; $is_same or $ndiff++; printf "%s", $is_same? ".": "@"; } print "\n\n"; # Test all character pairs for (my $i = 0; $i < 256; $i++) { for (my $j = 0; $j < 256; $j++) { my $is_same = hash_matches_array($i, $j)? 1: 0; $is_same or $ndiff++; printf "%s", $is_same? ".": "@"; } } print "\n\n"; print "Number of mismatches = $ndiff\n"; # Uncomment out this line to see a lot of strange character print "Type [RETURN] to see a dump of the hash ... "; <STDIN>; print Dumper($phash), "\n"; sub save_to_hash_and_array { my ($idx0, $idx1) = @_; my $key = chr($idx0); defined($idx1) and $key .= chr($idx1); my $value = sprintf "(%s:%s)", $idx0, ($idx1 || "blank"); $phash->{$key} = $value; $parray->[$idx0 + 256*($idx1||0)] = $value; } sub hash_matches_array { my ($idx0, $idx1) = @_; my $key = chr($idx0); defined($idx1) and $key .= chr($idx1); my $hval = $phash->{$key}; my $aval = $parray->[$idx0 + 256*($idx1||0)]; return ($hval eq $aval)? 1: 0; }

    Sure enough, it confirmed what I suspected.  No difference between the value assigned to each key and the result pulled out of the hash later (as compared against the same value saved in an array), for all possible single characters, and all possible character pairs for good measure.

    But I'm still very curious -- how on earth did such a question occur to you?!


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
      liverpole,
      I didn't read your entire post, just a couple of lines.

      Having said that, though, there is one exception: an empty string will NOT work,...

      I am not sure where you got that idea but an empty string works fine:
      #!/usr/bin/perl use strict; use warnings; my %hash; $hash{''} = 'foo'; print $hash{''};

      But I'm still very curious -- how on earth did such a question occur to you?

      My guess would be due to a background in another language. Certain C libraries have restriction on hash key length for instance while Java allows you to store objects as hash keys .

      Cheers - L~R

Re: Hash key string restrictions?
by ikegami (Patriarch) on Feb 09, 2007 at 23:51 UTC

    You should be able to use any string.

    A string of characters and it's UTF-8 encoding are even treated as distinct strings, even though strings of characters are stored as UTF-8 internally.

      A string of characters and it's UTF-8 encoding are even treated as distinct strings, even though strings of characters are stored as UTF-8 internally.
      Yowza! That doesn't sound right, at least, not in the Perl 5.8.x world. While on 5.6.x, I guess it's kind of normal that a UTF-8 string and a bytes string are not the same.

      So, how is the situation exactly? Does anyone with a clue on the actual implementation, feel like explaining what the deails are really like? Or, if no one comes forward, would anyone with a lot of courage — and time, and with a real hunger for knowledge, and , feel like spitting this out?

        That doesn't sound right

        I beg to differ. They are not the same. Hashes should consider them different just like eq does. In perlguts speak, the UTF8 flag is preserved and observed (unlike the taint flag). That flag is a very important piece of information, and it should not be ignored.

        Strings of chars and strings of encoded chars (no matter what the encoding) are not the same thing.

        use Test::Simple tests => 6; use strict; use warnings; #use Devel::Peek qw( Dump ); use Encode qw( encode _utf8_on _utf8_off); { ok(1); my $chars_suites = "\x{2660}\x{2661}\x{2662}\x{2663}"; my $utf8_suites = encode('UTF-8', $chars_suites); _utf8_off(my $utf8_suites2 = $chars_suites); _utf8_on(my $chars_suites2 = $utf8_suites); # # Make sure they're the same, UTF8 flag non-withstanding. # print("\n"); # print("chars_suites:\n"); Dump($chars_suites); # print("\n"); # print("chars_suites2\n"); Dump($chars_suites2); # print("\n"); # print("utf8_suites:\n"); Dump($utf8_suites); # print("\n"); # print("utf8_suites2\n"); Dump($utf8_suites2); # print("\n"); ok($chars_suites eq $chars_suites2); ok($utf8_suites eq $utf8_suites ); ok($chars_suites ne $utf8_suites ); ok($chars_suites2 ne $utf8_suites2 ); my %hash; $hash{$chars_suites } .= 'a'; $hash{$chars_suites2} .= 'b'; $hash{$utf8_suites } .= 'c'; $hash{$utf8_suites2 } .= 'd'; ok($hash{$chars_suites} eq 'ab' && $hash{$utf8_suites } eq 'cd'); }
Re: Hash key string restrictions?[THANKS]
by Ojosh!ro (Beadle) on Feb 10, 2007 at 01:17 UTC

    Thank you all very much. This is the answer I hoped for.
    As to the purpose of the question:
    I was experimenting with eval and wondered if I could put sub-references in a hash to prevent double evals.
    (the mechanism behind it would check the existence of the to-be-evalled string and simply return a reference to the sub-ref in question if it existed and otherwise make a new pair )

    I have no clue of whether such a mechanism already exists under perl's shiny bonnet.

      Yup! Assuming the evaled code returns a code ref,
      my $code_ref = $code_cache{$code} ||= eval $code; die("Unable to compile and run code: $@\n)" if !$code_ref; $code_ref->();
        That was the basic idea,
        ( the ||= eval $code part would also be stored in the %code_cache )