in reply to Re: UTF8 to UTF16, but...
in thread UTF8 to UTF16, but...
(...) and almut's guess about it being big-endian turns out to be wrong, then it must be little-endian ("UTF-16LE").
Not exactly a 'guess' :) Given the OP said that "0442043504410442" corresponds to the sample word "тест", it cannot really be little-endian, because that would be "4204350441044204".
Maybe it's worth noting that "UTF-16" with encode() assumes "BE" (quote from Encode::Unicode):
"When BE or LE is omitted during encode(), it returns a BE-encoded string with BOM prepended. So when you want to encode a whole text file, make sure you encode() the whole text at once, not line by line or each line, not file, will have a BOM prepended."
Of course, from the sample word alone we cannot tell whether a BOM is required.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: UTF8 to UTF16, but...
by graff (Chancellor) on Dec 12, 2008 at 04:20 UTC |