Ok. I just read the Joel article linked to from the perlunitut page that shmem linked to. So things are a little clearer now. :)
THe "secret decoder bytes" (BOM) is a unicode-specific thing. {snip} I think it's also only commonly used when using UCS-2 on Windows.
Ah. Ok. Incidentally, I'm not running MS Windows, but rather am using Emacs on GNU/Linux, occasionally making use of ncurses-hexedit. Emacs, running as a GUI under X, happens to have a little area where you can hover the mouse and it tells you encoding information, but I've only ever seen it tell me ascii or iso-latin-1.
Also - to clarify, "Unicode" by itself isn't really an encoding {snip}
Ah. Now things are clearer. I see now that Unicode is simply a character set (where each character has a number associated with it (a so-called "code point")). And, as you point out, there's any number of ways you can encode it.
"Unicode" is the list of characters (with an associated number) and the various encodings (UTF8, UCS-2, UTF16, etc) specify how to convert that Unicode number to a sequence of bytes and back.
Very good.
Most interesting to me is that UTF-8 is a *Unicode* encoding. Now things make a bit more sense. :)
I typically use GNU/Linux systems, and will look into what's involved with properly setting them up to use UTF-8. Thanks again!
In reply to Re^6: Unicode2ascii
by j3
in thread Unicode2ascii
by Haspalm2
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |