comment on

Ok. I just read the Joel article linked to from the perlunitut page that shmem linked to. So things are a little clearer now. :)

THe "secret decoder bytes" (BOM) is a unicode-specific thing. {snip} I think it's also only commonly used when using UCS-2 on Windows.

Ah. Ok. Incidentally, I'm not running MS Windows, but rather am using Emacs on GNU/Linux, occasionally making use of ncurses-hexedit. Emacs, running as a GUI under X, happens to have a little area where you can hover the mouse and it tells you encoding information, but I've only ever seen it tell me ascii or iso-latin-1.

Also - to clarify, "Unicode" by itself isn't really an encoding {snip}

Ah. Now things are clearer. I see now that Unicode is simply a character set (where each character has a number associated with it (a so-called "code point")). And, as you point out, there's any number of ways you can encode it.

"Unicode" is the list of characters (with an associated number) and the various encodings (UTF8, UCS-2, UTF16, etc) specify how to convert that Unicode number to a sequence of bytes and back.

Very good.

Most interesting to me is that UTF-8 is a *Unicode* encoding. Now things make a bit more sense. :)

I typically use GNU/Linux systems, and will look into what's involved with properly setting them up to use UTF-8. Thanks again!

In reply to Re^6: Unicode2ascii by j3
in thread Unicode2ascii by Haspalm2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.