comment on

Would you like Perl to "automagically" encode/decode JSON? ASN.1? Why, specifically, do you demand it of UTF-8?

It's a matter of convenience, primarily — and in some cases, transparency (such as having a single point of configuration where the encoding can be switched, rather than requiring every piece of code to take care of it on its own).

The comparison to JSON or ASN.1 seems somewhat far-fetched to me. Unicode is envisaged - and I think widely accepted - to eventually become the successor of legacy character encodings such as Latin-1, with their well known limits. And, among the Unicode encodings, UTF-8 would presumably be a good choice to be used as the default (because it was specifically designed with backwards compatibility in mind). In contrast, JSON / ASN.1 are rather special purpose (and typically not used as character encodings), so I don't currently see any need to have similar built-in support for them in Perl.

The truth is that UTF-8 is a variable-length character encoding method. It's probably a good thing that you have to explicitly decode inputs and encode outputs.. it forces you to know what you are doing.

Equally (with a hypothetical pure ASCII mind set in place) you could say: "The truth is that Latin-1 is a (specific) 8-bit character encoding method. It's probably a good thing that you have to explicitly decode inputs and encode outputs.. it forces you to know what you are doing." — Still, we do have Latin-1 semantics by default in Perl...

Just because UTF-8 is variable length doesn't mean it wouldn't be a sensible choice in environments that otherwise make use of it, in particular when the programmer explicitly requests that very functionality using a pragma.

(...) MIME headers must be in a 7-bit encoding

The current 8-bit default for IO could cause just as much potential breakage as UTF-8 would in this case. I don't think that particular limits which apply to certain content (or parts thereof) is a good argument against generally providing a way to conveniently say "I want UTF-8 to be used as default for all strings/content" (which is what I think the OP had in mind). Special cases can be dealt with in the application code. As things are now, UTF-8 (or, more generally, anything non-Latin-1) is still too often the "special case", rather than a (configurable!) global default.

In reply to Re^3: Pragma to handle unicode characters by almut
in thread Pragma to handle unicode characters by wanradt

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.