in reply to Re^2: Unicode2ascii
in thread Unicode2ascii
On windows, it guesses, sometimes wrongly. This is the origin of the notepad bug stories which come up from time to time. (There is a Windows API function which looks at the byte stream and tries to guess. Notepad calls this function, but it isn't reliable on short, even-length strings of ASCII).
It's also a bit more complex than that, because you can write a Byte-Order-Mark (two-byte sequence) at the beginning of the text stream, which indicates that the following characters are in a certain encoding, but this is in-band signalling, which kind of sucks, because it only really works if you know the file is already Unicode.
This area is UTF8's strength. Since ASCII is a strict subset of UTF8, you can treat a stream of bytes as UTF8 and everything will be fine if the stream is actually ASCII. As long as the stream is one of those two, you're OK.
So there are two main camps:
Windows: We're slowly moving to two-bytes everywhere UCS-2. People need to guess which encoding is in use.
Unix: We're moving from ASCII to UTF8. If your app treats text files as containing UTF8 it'll work happily with ASCII or UTF8 files.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Unicode2ascii
by j3 (Friar) on Nov 28, 2006 at 17:46 UTC | |
by jbert (Priest) on Nov 28, 2006 at 18:14 UTC | |
by j3 (Friar) on Nov 28, 2006 at 19:08 UTC |