in reply to Re^2: Yet another Encoding issue...
in thread Yet another Encoding issue...

It looks like your é is the result of printing the encoded utf8 of é. You needed to print the decoded value. For example:
perl -we 'use Encode; $c = encode("UTF-8", "é"); $dc = decode("UTF-8", + $c); print "\$c = $c \$dc = $dc\n"'
outputs: $c = é $dc = é

Replies are listed 'Best First'.
Re^4: Yet another Encoding issue...
by Bod (Parson) on Jun 01, 2024 at 21:48 UTC

    That makes sense thanks!

    $reply->{'response'} = decode('UTF-8', $data{'userChat'}); seems to have done the trick on the test script...

    So that's one problem solved. It seems I'm now getting encoding problems from AI::Chat, but only when I call the chat method, not when I call the prompt method. But that doesn't make a lot of sense as prompt uses chat...

    I'll have to try and simplify the code and see if I can reproduce it!

    Update:

    This is the sort of thing I'm getting back from AI::Chat

    {response: 'Ã\x83Â\x96zÃ\x83¼r dilerim, belki de sorumu yanlÃ\x84±Ã\ +x85Â\x9F s…rirken en keyif aldÃ\x84±Ã\x84Â\x9FÃ\x84±nÃ\x84±z Ã\x85 +Â\x9Fey ne?\n'}

    Another Update:

    { correction: 'Turkce alfabe oldukca turaf.\n\nThe correct sentence shou +ld be: "Türkçe alfabesi oldukça tuhaf."\n\nExplanation:\n1. The word +"Türkçe" is not capitalized, it should be as it's a proper noun.\n2. +The word "alfabe" is also missing its possessive suffix, it should be + "alfabesi" to show that it belongs to Turkish language.\n3. The word + "turaf" is not a word in Turkish. The correct word meaning "strange" + or "weird" is "tuhaf".', response: 'Evet, Türk alfabesi Latin alfabesine dayanır ve 29 harf +ten oluşur. Her harfin belirli bir sesi temsil ettiği +ni biliyor muydun?' }

    The correction comes from the prompt method and the characters display correctly whereas the response comes from chat and the is unreadable...

      By the way, depending on what charset you are specifying in your html you may get problems. For example, the little CGI script:
      #!/bin/bash echo "Content-Type: text/html" echo "" perl -we 'use Encode; $c = encode("UTF-8", "é"); $dc = decode("UTF-8", + $c); print "\$c = $c \$dc = $dc\n"'
      displays as: $c = é $dc = é

      But if you fix the encoding like:

      #!/bin/bash echo "Content-Type: text/html; charset=UTF-8" echo "" perl -we 'use Encode; $c = encode("UTF-8", "é"); $dc = decode("UTF-8", + $c); print "\$c = $c \$dc = $dc\n"'
      it displays as: $c = é $dc = é

        I'm using UTF-8

        Content-type: text/html; charset=UTF-8 <html> <meta charset="UTF-8">