Beefy Boxes and Bandwidth Generously Provided by pair Networks
Just another Perl shrine
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??
Calling utf8_on is almost never the right thing to do. Can you explain what problem this is solving for you?

To elaborate a bit on the flag, it only tells perl’s string implementation whether it should be processing the string using ascii rules or using utf8 rules. If you change the flag you’re probably just breaking things unless you know for a fact that the ascii bytes of the string are in fact a valid utf8 sequence of bytes. If you want to take a string of latin1 characters and make sure they are represented as utf8 before handing those bytes to Python, you should be calling utf8::upgrade, and if you read some utf8 from a file handle and want the string to understand that it contains characters and not bytes, you should call utf8::decode. Because perl uses utf8 internally, calling utf8::decode doesn’t actually change any bytes (the first time you call it) and is just like calling utf8_on except that it also verifies that the string contains valid utf8.

Edit: Whoops, I didn’t read that carefully enough. I was expecting that the module you linked was recursively setting the utf8 flag in the manner of SvUTF8_on. Actually it does call upgrade, and the author just chose the name poorly.

It sounds sort of like you are saying that Inline::Python refuses to serialize string data unless it has the utf8 flag set on the string. This sounds like a bug in Inline::Python. The correct serialization pattern would be to upgrade the string as it was getting serialized, preferably only on the bytes being moved and without altering the original SV.

For your code example, it appears the only way to upgrade hash keys (in pure perl) is to rebuild the hash:

for (keys %h) { utf8::upgrade($_); $h{$_}= $h{$_} }
hash keys are not SV instances, and the way utf8 is indicated to hv_store is with a negative key length, and when you consider that any string containing a byte above 0x7f would need re-hashed… I think the only way is rebuilding the hash.

In reply to Re: How to convert hash keys to utf8 by NERDVANA
in thread How to convert hash keys to utf8 by mpersico

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others browsing the Monastery: (3)
As of 2024-04-19 21:25 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found