in reply to Re^2: Decoding bad UTF-16
in thread Decoding bad UTF-16
I switched from "open(FILE, "<:encoding(UTF-8)", $file)" to using decode()
eh? UTF-8?
decode and <:encoding are the same thing.
UTF-16:Unrecognised BOM
When you specify UTF-16, the file must have a BOM. Specify the actual encoding (UTF-16le or UTF-16be) otherwise.
I also tried using USC-2, but I get "illegal unicoded char",
That's not possible. I've just shown you that every possible byte combination is accepted by decode.
Why bytes causes that, and what encoding did you specify, UCS-2le or UCS-2be?
Next, I'll try the suggested success/fail code, but I don't quite understand it.
It demonstrates that all bytes combination work with UCS-2, and since UCS-2 is a very close relative to UTF-16, you'll get further by using that. It's probably what Word uses anyway, since Windows likes to lie about using UTF-16.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Decoding bad UTF-16
by gregality (Initiate) on Sep 29, 2008 at 20:39 UTC | |
by ikegami (Patriarch) on Sep 29, 2008 at 21:07 UTC |