Re: How to encode and decode chinese string to iso-8859-1 encoding format

Hi, there are a couple of things I see now looking on a full-size screen ...

First, in the second set of statements you encode $str to $secondaryOctets but then print the output of decoding $str.

The above does not fix the issue, though. When you tell Perl to use utf8; on the source code, it reads any high unicode characters in as characters, rather than as a sequence of bytes. This works well for your first case when you decode from and to UTF-8. But since ISO-8859-1 doesn't know about multi-byte characters, you get the wide character error. You should not tell Perl that the source code is in UTF-8 *if* you plan to read it in as bytes.

Similarly but separately, when you apply the ':utf8' IO layer to STDOUT, you are telling Perl that the output is going to be encoded in UTF-8. That's not the case when you've encoded to ISO-8859-1, so you shouldn't apply the layer.

The following script attempts to demonstrate what I mean:

use strict; use warnings; use feature 'say';
use Encode;
use Class::Unload;

{
    say 'With UTF-8';
    use utf8;
    my $str = '這是一個測試';
    my $perl = encode("utf8", $str);

    binmode STDOUT, ':utf8';
    say decode("utf8", $perl);
}

{
    say 'With ISO-8859-1';
    Class::Unload->unload('utf8');
    my $str = '這是一個測試';
    my $perl = encode("ISO-8859-1" , $str);

    binmode STDOUT;
    say decode("ISO-8859-1", $perl);
}

__END__

Outputs:

$ perl 1203139.pl

With UTF-8
這是一個測試
With ISO-8859-1
這是一個測試

Disclaimer: Working with encodings is very complicated, as you know, and I am not an expert in the field. As this example shows there can be multiple overlaying issues, and it's possible for a script to appear to be working right when it's just an accident. So while it is my best understanding, I don't guarantee that my explanation here is correct.

Hope this helps!

The way forward always starts with a minimal test.

Comment on Re: How to encode and decode chinese string to iso-8859-1 encoding format Select or Download Code

Replies are listed 'Best First'.
Re^2: How to encode and decode chinese string to iso-8859-1 encoding format by thanos1983 (Parson) on Nov 11, 2017 at 08:00 UTC
Hello 1nickt That makes a lot of sense. Thanks for clarification. I am not also an expert on Binaries I am trying to learn a few things by experimentation. Thanks again for your time and effort to provide a small sample, it helped me to understand a lot BR / Thanos Seeking for Perl wisdom...on the process of learning...not there...yet!	[reply] [d/l] [select]