Calling utf8_on is almost never the right thing to do. Can you explain what problem this is solving for you?

To elaborate a bit on the flag, it only tells perl’s string implementation whether it should be processing the string using ascii rules or using utf8 rules. If you change the flag you’re probably just breaking things unless you know for a fact that the ascii bytes of the string are in fact a valid utf8 sequence of bytes. If you want to take a string of latin1 characters and make sure they are represented as utf8 before handing those bytes to Python, you should be calling utf8::upgrade, and if you read some utf8 from a file handle and want the string to understand that it contains characters and not bytes, you should call utf8::decode. Because perl uses utf8 internally, calling utf8::decode doesn’t actually change any bytes (the first time you call it) and is just like calling utf8_on except that it also verifies that the string contains valid utf8.

Edit: Whoops, I didn’t read that carefully enough. I was expecting that the module you linked was recursively setting the utf8 flag in the manner of SvUTF8_on. Actually it does call upgrade, and the author just chose the name poorly.

It sounds sort of like you are saying that Inline::Python refuses to serialize string data unless it has the utf8 flag set on the string. This sounds like a bug in Inline::Python. The correct serialization pattern would be to upgrade the string as it was getting serialized, preferably only on the bytes being moved and without altering the original SV.

For your code example, it appears the only way to upgrade hash keys (in pure perl) is to rebuild the hash:

for (keys %h) { utf8::upgrade($_); $h{$_}= $h{$_} }
hash keys are not SV instances, and the way utf8 is indicated to hv_store is with a negative key length, and when you consider that any string containing a byte above 0x7f would need re-hashed… I think the only way is rebuilding the hash.

In reply to Re: How to convert hash keys to utf8 by NERDVANA
in thread How to convert hash keys to utf8 by mpersico

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.