in reply to Hash key string restrictions?

You should be able to use any string.

A string of characters and it's UTF-8 encoding are even treated as distinct strings, even though strings of characters are stored as UTF-8 internally.

Replies are listed 'Best First'.
Re^2: Hash key string restrictions?
by bart (Canon) on Feb 10, 2007 at 08:04 UTC
    A string of characters and it's UTF-8 encoding are even treated as distinct strings, even though strings of characters are stored as UTF-8 internally.
    Yowza! That doesn't sound right, at least, not in the Perl 5.8.x world. While on 5.6.x, I guess it's kind of normal that a UTF-8 string and a bytes string are not the same.

    So, how is the situation exactly? Does anyone with a clue on the actual implementation, feel like explaining what the deails are really like? Or, if no one comes forward, would anyone with a lot of courage — and time, and with a real hunger for knowledge, and , feel like spitting this out?

      That doesn't sound right

      I beg to differ. They are not the same. Hashes should consider them different just like eq does. In perlguts speak, the UTF8 flag is preserved and observed (unlike the taint flag). That flag is a very important piece of information, and it should not be ignored.

      Strings of chars and strings of encoded chars (no matter what the encoding) are not the same thing.

      use Test::Simple tests => 6; use strict; use warnings; #use Devel::Peek qw( Dump ); use Encode qw( encode _utf8_on _utf8_off); { ok(1); my $chars_suites = "\x{2660}\x{2661}\x{2662}\x{2663}"; my $utf8_suites = encode('UTF-8', $chars_suites); _utf8_off(my $utf8_suites2 = $chars_suites); _utf8_on(my $chars_suites2 = $utf8_suites); # # Make sure they're the same, UTF8 flag non-withstanding. # print("\n"); # print("chars_suites:\n"); Dump($chars_suites); # print("\n"); # print("chars_suites2\n"); Dump($chars_suites2); # print("\n"); # print("utf8_suites:\n"); Dump($utf8_suites); # print("\n"); # print("utf8_suites2\n"); Dump($utf8_suites2); # print("\n"); ok($chars_suites eq $chars_suites2); ok($utf8_suites eq $utf8_suites ); ok($chars_suites ne $utf8_suites ); ok($chars_suites2 ne $utf8_suites2 ); my %hash; $hash{$chars_suites } .= 'a'; $hash{$chars_suites2} .= 'b'; $hash{$utf8_suites } .= 'c'; $hash{$utf8_suites2 } .= 'd'; ok($hash{$chars_suites} eq 'ab' && $hash{$utf8_suites } eq 'cd'); }