http://qs1969.pair.com?node_id=11136927


in reply to How to convert hash keys to utf8

Calling utf8_on is almost never the right thing to do. Can you explain what problem this is solving for you?

To elaborate a bit on the flag, it only tells perl’s string implementation whether it should be processing the string using ascii rules or using utf8 rules. If you change the flag you’re probably just breaking things unless you know for a fact that the ascii bytes of the string are in fact a valid utf8 sequence of bytes. If you want to take a string of latin1 characters and make sure they are represented as utf8 before handing those bytes to Python, you should be calling utf8::upgrade, and if you read some utf8 from a file handle and want the string to understand that it contains characters and not bytes, you should call utf8::decode. Because perl uses utf8 internally, calling utf8::decode doesn’t actually change any bytes (the first time you call it) and is just like calling utf8_on except that it also verifies that the string contains valid utf8.

Edit: Whoops, I didn’t read that carefully enough. I was expecting that the module you linked was recursively setting the utf8 flag in the manner of SvUTF8_on. Actually it does call upgrade, and the author just chose the name poorly.

It sounds sort of like you are saying that Inline::Python refuses to serialize string data unless it has the utf8 flag set on the string. This sounds like a bug in Inline::Python. The correct serialization pattern would be to upgrade the string as it was getting serialized, preferably only on the bytes being moved and without altering the original SV.

For your code example, it appears the only way to upgrade hash keys (in pure perl) is to rebuild the hash:

for (keys %h) { utf8::upgrade($_); $h{$_}= $h{$_} }
hash keys are not SV instances, and the way utf8 is indicated to hv_store is with a negative key length, and when you consider that any string containing a byte above 0x7f would need re-hashed… I think the only way is rebuilding the hash.