Esteemed monks, I have a problem with encoding of an emoji in perl. The string "Test 😀" is read from a MySQL database table which has encoding latin1. Obviously that's not a UTF-8 character set, but my console seems smart enough to detect the intended output, as a plain "select * from table" on the "mysql" client displays the "grinning face" emoji correctly.

Then my perl (version 5.26) program logs the text to a log file, and again running "tail -f" on the log file displays the emoji in the text correctly. I also log the bytes using sprintf( "%vX", $text) and it prints "54.65.73.74.20.F0.9F.98.80". So the bytes for the emoji are there, in "F0.9F.98.80".

Then the text is JSON encoded (using the JSON library), and sent using $conn->send_utf8() to a websocket client using Net::WebSocket::Server, however the websocket client (running in a web browser) receives "Test 😀". I've tried encode( 'UTF-8', $text ) which did not fix the problem.

The whole subject of character encoding is not an easy one, and mixing MySQL with Perl with websockets (with JSON for good measure) has made it tricky to tell where the problem is.

Can anyone help find why the websocket client doesn't receive the emoji correctly please?

In reply to Encoding of emoji character by dcunningham

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.