I am parsing the JSON file returned by a service, and I can not get rid of the error "malformed UTF-8 character in JSON string". Any advice is welcomed. I see there are strange escapes in the JSON, but this is how data are returned....

use LWP::UserAgent; use JSON; use Data::Dumper; my $url = "http://127.0.0.1:8000/complete/"; my $data = {text => "my word"}; my $ua = LWP::UserAgent->new; # Creating a POST request my $req = HTTP::Request->new(POST => $url); $req->header('Content-Type' => 'application/json'); $req->content(encode_json($data)); # Sending request my $response = $ua->request($req); print Dumper $response; if ($response->is_success) { print "JSON data was successfully returned!\n"; my $content = decode_json($response->content); # Extract the JSON string containing the terms my $json_string = $content->{'response'}{'choices'}[0]{'message'}{ +'content'}; $json_string = (split "\n\n", $json_string)[0]; # Load the JSON string into a hash my $json_data = decode_json($json_string); # Extract the terms from the hash my $terms = $json_data->{'terms'}; # Print the terms foreach my $term (@$terms) { print "$term\n"; } } else { print "Error: " . $response->status_line . " - " . $response->cont +ent . "\n"; }

This is what I get

$VAR1 = bless( { '_protocol' => 'HTTP/1.1', '_content' => '{"status":"SUCCESS","data":"promt","re +sponse":{"id":"xxx","object":"chat.completion","created":1690752087," +model":"gpt-3.5-turbo-0613","choices":[{"index":0,"message":{"role":" +assistant","content":"{\\n \\"related_words\\": [\\n \\"Bundeskan +zlerin\\",\\n \\"Politik\\",\\n \\"Deutschland\\",\\n \\"CDU +\\",\\n \\"Kanzleramt\\",\\n \\"Bundesregierung\\",\\n \\"Po +litikerin\\",\\n \\"Bundestag\\",\\n \\"Regierung\\",\\n \\" +Partei\\",\\n \\"Wahl\\",\\n \\"Bundeskanzler\\",\\n \\"Euro +päische Union\\",\\n \\"Bundesrepublik\\",\\n \\"Europa\\",\\n + \\"Führungsperson\\",\\n \\"Staatschefin\\",\\n \\"Frauenpol +itik\\",\\n \\"G8-Gipfel\\",\\n \\"Macht\\"\\n ]\\n}"},"finish +_reason":"stop"}],"usage":{"prompt_tokens":48,"completion_tokens":145 +,"total_tokens":193}}}', '_msg' => 'OK', '_request' => bless( { '_content' => '{"text":"prompt +"}', '_headers' => bless( { 'user-a +gent' => 'libwww-perl/6.62', 'conten +t-type' => 'application/json' }, 'HTTP: +:Headers' ), '_uri_canonical' => bless( do{ +\(my $o = 'http://127.0.0.1:8000/complete/')}, 'URI::http' ), '_method' => 'POST', '_uri' => $VAR1->{'_request'}{ +'_uri_canonical'} }, 'HTTP::Request' ), '_rc' => '200', '_headers' => bless( { 'client-peer' => '127.0.0.1:80 +00', 'client-response-num' => 1, '::std_case' => { 'client-date +' => 'Client-Date', 'client-peer +' => 'Client-Peer', 'client-resp +onse-num' => 'Client-Response-Num' }, 'content-length' => '894', 'connection' => 'close', 'date' => 'Sun, 30 Jul 2023 21 +:21:25 GMT', 'client-date' => 'Sun, 30 Jul +2023 21:21:31 GMT', 'content-type' => 'application +/json', 'server' => 'uvicorn' }, 'HTTP::Headers' ) }, 'HTTP::Response' ); JSON data was successfully returned! malformed UTF-8 character in JSON string, at character offset 242 (bef +ore "\x{fffd}che Union",\n...") at /Users/post.pl line 29.

PS: my code is ported from Python. In the Python code there is no complaining about malformed UTF-8


In reply to malformed UTF-8 character in JSON string by Takamoto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.