Re: Wide characters and UTF8

The documentation for decode_json says it takes a UTF8 encoded string.

This is the doc:

decode_json
$perl_scalar = decode_json $json_text
[download]
The opposite of encode_json: expects an UTF-8 (binary) string and tries to parse that as an UTF-8 encoded JSON text, returning the resulting reference. Croaks on error.

(Emphasis mine)

decode_json expects BYTES, not UTF-8 CHARACTERS. Feed it the non-decoded file (i.e. open raw, not with :encoding) and everything shoud work.

Alexander

--
Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

Comment on Re: Wide characters and UTF8 Select or Download Code

Replies are listed 'Best First'.
Re^2: Wide characters and UTF8 by Bod (Parson) on Nov 08, 2023 at 17:50 UTC
Thanks - that has got me much further... So I am clear, is the `:encoding` in `open` telling Perl how the file is currently encoded or is it instructing Perl to encode the data?	[reply] [d/l] [select]
Re^3: Wide characters and UTF8 by NERDVANA (Priest) on Nov 08, 2023 at 19:31 UTC
The important thing to know about Perl unicode support is that Perl does not track the type of a scalar. You, the programmer, need to keep track of whether you have a string of bytes or a string of unicode characters. The easiest way to do this is always decode bytes (like utf8 or utf16) into characters the moment it enters the program, like with your `":encoding(UTF-8)"` mode. As it happens, the `decode_json` function expects bytes as input, assuming you haven't done the decoding yet, and then it both decodes UTF-8 and parses JSON at the same time. On the other hand, if you say `JSON->new->decode($string)` that assumes you provided it with a unicode string. So in summary: `open my $fh, '<', $filename; $bytes= <$fh>; $data= decode_json($bytes);` [download] or `open my $fh, '<:encoding(UTF-8)', $filename; $chars= <$fh>; $data= JSON->new->decode($chars);` [download]	[reply] [d/l] [select]
Re^3: Wide characters and UTF8 by Corion (Patriarch) on Nov 08, 2023 at 17:53 UTC
The `:encoding` tells Perl what encoding the data in the file is in, and Perl will then decode the data and give you Unicode strings.	[reply] [d/l]