Re^6: Unexpected output from my PERL program. WHAT is my problem???

iso-8859-1 can only represent a small subset of the characters supported by UTF-16. The conversion to iso-8559-1 may result in information loss.

Just for future reference in case I run across this situation, is there a charset that can be used to circumvent this issue? Or am I stuck with the potential loss? Thanks.

Comment on Re^6: Unexpected output from my PERL program. WHAT is my problem???

Replies are listed 'Best First'.
Re^7: Unexpected output from my PERL program. WHAT is my problem??? by ikegami (Patriarch) on Nov 04, 2009 at 18:02 UTC
is there a charset that can be used to circumvent this issue? You mean "character encoding", not "character set". The question is: Is there a character encoding that can represent the Unicode character set? All UTF-* encodings can handle all Unicode characters. There's obviously something missing to the question since you started off with such a character encoding (UTF-16be). Also worth mentioning are the UCS-2* encodings. UCS-2le and UCS-2be are the fixed-width subsets of UTF-16le and UTF-16le. They can handle a big chunk of Unicode (U+0000..U+FFFF). Windows uses UCS-2le internally and uses this for its Wide interface. UTF-8 is the charset of choice elsewhere. In fact, unix terminals tend to expect UTF-8 these days. It kinda surprised me when you asked for iso-8859-1.	[reply]

Replies are listed 'Best First'.

Re^7: Unexpected output from my PERL program. WHAT is my problem???
by ikegami (Patriarch) on Nov 04, 2009 at 18:02 UTC

is there a charset that can be used to circumvent this issue?

You mean "character encoding", not "character set". The question is:

Is there a character encoding that can represent the Unicode character set?

All UTF-* encodings can handle all Unicode characters.

There's obviously something missing to the question since you started off with such a character encoding (UTF-16be).

Also worth mentioning are the UCS-2* encodings. UCS-2le and UCS-2be are the fixed-width subsets of UTF-16le and UTF-16le. They can handle a big chunk of Unicode (U+0000..U+FFFF).

Windows uses UCS-2le internally and uses this for its Wide interface. UTF-8 is the charset of choice elsewhere.

In fact, unix terminals tend to expect UTF-8 these days. It kinda surprised me when you asked for iso-8859-1.

[reply]