in reply to Re^4: Encoding of emoji character
in thread Encoding of emoji character

Interesting. None of these characters have code points above 255, and yet you sometimes get the error in decode($text).

You said the table encoding is latin-1. My current guess is, you get your information decoded as if it was latin1. Most of it looks like bytes, but occasionally, latin-1 text decodes to wide characters and blows up decode (which only expects bytes). What if you encode $text back to latin-1 to get bytes, then decode those as UTF-8? This transformation seems to be reversible as long as all bytes round-trip, that is, MySQL's interpretation of "latin-1" is the same as Perl's and has a meaning for all 256 possible byte values.

Replies are listed 'Best First'.
Re^6: Encoding of emoji character
by soonix (Chancellor) on Jun 21, 2022 at 18:52 UTC
    I think the "wide character" doesn't simply mean "above 255(0xFF)", but "not in the character set". Latin1 defines characters in the ranges 0x00 .. 0x7E and 0xA0 .. 0xFF. Those within 0x7F .. 0x9F are "not within one of the defined ranges", thus "out of range" = "wide character".
Re^6: Encoding of emoji character
by dcunningham (Sexton) on Jun 22, 2022 at 03:40 UTC
    Thank you, this appears to have fixed it! Using the following code the emoji is passed and displayed correctly on the websocket client.
    $text = encode( 'iso-8859-1', $text ); $text = decode( 'UTF-8', $text ); $conn->send_utf8( $text );