comment on

Those in love with Perl but who have less experience with unicode will all likely see things according to the majority opinion here, and particularly this will be true for those inexperienced with Asian languages and fonts. European languages have many characters which, if not in the lower ASCII range, can be adequately represented by UTF7 and do not require full UTF8 compatibility. It may even be partly due to the difficulties with UTF7 that the present Perl documents cast doubts on the security of using UTF8, because there are issues with the implementation of UTF7.

From an online source:

UTF-7 isn't a "Unicode Transformation Format", as the definition can only encode code points in the BMP. However if a UTF-7 translator is to/from UTF-16 then it can encode each surrogate half as though it was a 16-bit code point, and thus can encode all code points. It is unclear if other UTF-7 software support this. UTF-7 has never been an official standard of the Unicode Consortium. It is known to have security issues, which is why software has been changed to disable its use. It is prohibited in HTML 5.

However, UTF7 was never useful (to my knowledge) for Asian languages. It is not truly "unicode," and even trying to say it were "unicode compatible" would be an exaggeration. It does not, and I think cannot, support the full range of characters in the unicode tables of today (likely why it has become obsolete).

Perl's unicode is problematic for many reasons. The core modules are not all unicode compliant, much less all of the general run-of-the-mill modules to be found on CPAN. While it is possible to program one's own unicode-compliant code using Perl, perhaps including adapting others' modules or programming one's own, this is not the same thing as to say that Perl was already unicode compatible. As the dictionary definition indicates, being "compatible" means not requiring special adaptation or modification--something which cannot be said of Perl, yet, considering the gymnastics the average coder will go through to learn the ropes for enabling unicode in his or her code. When the module "Encode" was removed from core, the gymnastic routines started over again to learn the new ways to handle unicode.

Perl is certainly adaptable, and able to be adapted. But being able to be adapted is not the same as coming with those adaptations already built-in and ready to use. One cannot just say: print "$unicode_string\n"; and expect a beautiful output as if the text were not unicode.

The problem is that Perl seems to have its own standard for the language it works from, and it translates all input/output based on that standard. If it were possible to say something like "use unicode;" as a pragma at the beginning of one's script which would then induce Perl to consider ALL program input, output, and internals to be in the same language of unicode, then I would say it not only "supports" unicode, but is fully "compatible" with it. Unfortunately, this is still a dream, not a reality.

Yes, you can program for unicode with Perl--it is supported. But it is not easy, as any of us with extensive experience can tell you--and it requires that one's code be specially adapted to handle the unicode, thus failing of the dictionary definition of "compatible." If you choose to believe that "compatible" and "supported" are equivalent terms, fully synonymous, then so be it. We may need to agree to disagree, as I do not see them to be identical words, and my usage here follows my understanding of their separate meanings.

Blessings,

~Polyglot~

In reply to Re^7: Converting Unicode by Polyglot
in thread Converting Unicode by BernieC

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.