in reply to UTF8 hash key downgraded when assigned

Hi, I am not seeing the same behaviour as you. (Also, keep in mind that almost never does one have to muck around with or even think about Perl's internal flag, which has little to do with usage of the string.) I'm dumping as you did, but note the existing test in Test::utf8.


use Test::Most tests => 6;
use Test::utf8;
use utf8;
binmode(STDOUT, ':utf8');
use Devel::Peek;

for my $str ('clé', '键') {
    is_flagged_utf8($str);
    Dump $str;

    my %hash    = ($str => 0);
    $hash{$str} = 1;
    (my $key)   = keys %hash;

    is_flagged_utf8($key);
    Dump $key;

    $key =~ s/(?:clé|键)/ключ/;

    is_flagged_utf8($key);
    Dump $key;

    print "$key\n";
}

__END__

Output (square brackets turned into links, but all the better to highlight the relevant lines in the dumps):
$ prove -lrv 1226566.pl
1226566.pl .. 
1..6
ok 1 - flagged as utf8
ok 2 - flagged as utf8
ok 3 - flagged as utf8
ключ
ok 4 - flagged as utf8
ok 5 - flagged as utf8
ok 6 - flagged as utf8
ключ
SV = PV(0x556a75a9e160) at 0x556a75ac3fe8
  REFCNT = 2
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK,UTF8)
  PV = 0x556a7667b370 "cl\303\251"\0 UTF8 "cl\x{e9}"
  CUR = 4
  LEN = 10
  COW_REFCNT = 0
SV = PV(0x556a765f31d0) at 0x556a7652d1a8
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x556a76698180 "cl\303\251"\0 UTF8 "cl\x{e9}"
  CUR = 4
  LEN = 5
SV = PV(0x556a765f31d0) at 0x556a7652d1a8
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x556a764c8b30 "\320\272\320\273\321\216\321\207"\0 UTF8 "\x{43a}\x{43b}\x{44e}\x{447}"
  CUR = 8
  LEN = 16
SV = PV(0x556a763289f0) at 0x556a75ac3f28
  REFCNT = 2
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK,UTF8)
  PV = 0x556a765197e0 "\351\224\256"\0 UTF8 "\x{952e}"
  CUR = 3
  LEN = 10
  COW_REFCNT = 0
SV = PV(0x556a765f31d0) at 0x556a7652d1a8
  REFCNT = 1
  FLAGS = (POK,IsCOW,pPOK,UTF8)
  PV = 0x556a75ac7a80 "\351\224\256" UTF8 "\x{952e}"
  CUR = 3
  LEN = 0
SV = PV(0x556a765f31d0) at 0x556a7652d1a8
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x556a7666eec0 "\320\272\320\273\321\216\321\207"\0 UTF8 "\x{43a}\x{43b}\x{44e}\x{447}"
  CUR = 8
  LEN = 16
ok
All tests successful.
Files=1, Tests=6,  0 wallclock secs ( 0.01 usr  0.00 sys +  0.06 cusr  0.00 csys =  0.07 CPU)
Result: PASS

Hope this helps!


The way forward always starts with a minimal test.
  • Comment on Re: UTF8 hash key downgraded when assigned