in reply to How to encode and decode chinese string to iso-8859-1 encoding format
Hi, there are a couple of things I see now looking on a full-size screen ...
First, in the second set of statements you encode $str to $secondaryOctets but then print the output of decoding $str.
The above does not fix the issue, though. When you tell Perl to use utf8; on the source code, it reads any high unicode characters in as characters, rather than as a sequence of bytes. This works well for your first case when you decode from and to UTF-8. But since ISO-8859-1 doesn't know about multi-byte characters, you get the wide character error. You should not tell Perl that the source code is in UTF-8 *if* you plan to read it in as bytes.
Similarly but separately, when you apply the ':utf8' IO layer to STDOUT, you are telling Perl that the output is going to be encoded in UTF-8. That's not the case when you've encoded to ISO-8859-1, so you shouldn't apply the layer.
The following script attempts to demonstrate what I mean:
use strict; use warnings; use feature 'say';
use Encode;
use Class::Unload;
{
say 'With UTF-8';
use utf8;
my $str = '這是一個測試';
my $perl = encode("utf8", $str);
binmode STDOUT, ':utf8';
say decode("utf8", $perl);
}
{
say 'With ISO-8859-1';
Class::Unload->unload('utf8');
my $str = '這是一個測試';
my $perl = encode("ISO-8859-1" , $str);
binmode STDOUT;
say decode("ISO-8859-1", $perl);
}
__END__
Outputs:
$ perl 1203139.pl With UTF-8 這是一個測試 With ISO-8859-1 這是一個測試
Disclaimer: Working with encodings is very complicated, as you know, and I am not an expert in the field. As this example shows there can be multiple overlaying issues, and it's possible for a script to appear to be working right when it's just an accident. So while it is my best understanding, I don't guarantee that my explanation here is correct.
Hope this helps!
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How to encode and decode chinese string to iso-8859-1 encoding format
by thanos1983 (Parson) on Nov 11, 2017 at 08:00 UTC |