in reply to Re: Wide characters and UTF8
in thread Wide characters and UTF8

Thanks - that has got me much further...

So I am clear, is the :encoding in open telling Perl how the file is currently encoded or is it instructing Perl to encode the data?

Replies are listed 'Best First'.
Re^3: Wide characters and UTF8
by NERDVANA (Priest) on Nov 08, 2023 at 19:31 UTC
    The important thing to know about Perl unicode support is that Perl does not track the type of a scalar. You, the programmer, need to keep track of whether you have a string of bytes or a string of unicode characters. The easiest way to do this is always decode bytes (like utf8 or utf16) into characters the moment it enters the program, like with your ":encoding(UTF-8)" mode.

    As it happens, the decode_json function expects bytes as input, assuming you haven't done the decoding yet, and then it both decodes UTF-8 and parses JSON at the same time. On the other hand, if you say JSON->new->decode($string) that assumes you provided it with a unicode string.

    So in summary:

    open my $fh, '<', $filename; $bytes= <$fh>; $data= decode_json($bytes);
    or
    open my $fh, '<:encoding(UTF-8)', $filename; $chars= <$fh>; $data= JSON->new->decode($chars);
Re^3: Wide characters and UTF8
by Corion (Patriarch) on Nov 08, 2023 at 17:53 UTC

    The :encoding tells Perl what encoding the data in the file is in, and Perl will then decode the data and give you Unicode strings.