The following is from the perluniintro (perldoc) included with 5.8.0:
Perl's Unicode Support Starting from Perl 5.6.0, Perl has had the capacity to handle Unicode natively. Perl 5.8.0, however, is the first recommended release for serious Unicode work. The maintenance release 5.6.1 fixed many of the problems of the initial Unicode implementation, but for example regular expressions still do not work with Unicode in 5.6.1. Starting from Perl 5.8.0, the use of "use utf8" is no longer necessary. In earlier releases the "utf8" pragma was used to declare that operations in the current block or file would be Unicode-aware. This model was found to be wrong, or at least clumsy: the "Unicodeness" is now carried with the data, instead of being attached to the operations. Only one case remains where an explicit "use utf8" is needed: if your Perl script itself is encoded in UTF-8, you can use UTF-8 in your identifier names, and in string and regular expression literals, by saying "use utf8". This is not the default because scripts with legacy 8-bit data in them would break. See utf8.
The 5.8.1 maintenance release made a few changes (http://search.cpan.org/src/JHI/perl-5.8.1/pod/perldelta.pod):
UTF-8 On Filehandles No Longer Activated By Locale
In Perl 5.8.0 all filehandles, including the standard filehandles, were implicitly set to be in Unicode UTF-8 if the locale settings indicated the use of UTF-8. This feature caused too many problems, so the feature was turned off and redesigned: see "Core Enhancements"
UTF-8 no longer default under UTF-8 locales
In Perl 5.8.0 many Unicode features were introduced. One of them was found to be of more nuisance than benefit: the automagic (and silent) "UTF-8-ification" of filehandles, including the standard filehandles, if the user's locale settings indicated use of UTF-8. For example, if you had en_US.UTF-8 as your locale, your STDIN and STDOUT were automatically "UTF-8", in other words an implicit binmode(..., ":utf8") was made. This meant that trying to print, say, chr(0xff), ended up printing the bytes 0xc3 0xbf. Hardly what you had in mind unless you were aware of this feature of Perl 5.8.0. The problem is that the vast majority of people weren't: for example in RedHat releases 8 and 9 the default locale setting is UTF-8, so all RedHat users got UTF-8 filehandles, whether they wanted it or not. The pain was intensified by the Unicode implementation of Perl 5.8.0 (still) having nasty bugs, especially related to the use of s/// and tr///. (Bugs that have been fixed in 5.8.1) Therefore a decision was made to backtrack the feature and change it from implicit silent default to explicit conscious option. The new Perl command line option -C and its counterpart environment variable PERL_UNICODE can now be used to control how Perl and Unicode interact at interfaces like I/O and for example the command line arguments. See perlrun/-C and perlrun/PERL_UNICODE for more information. You can also now use safe signals with POSIX::SigAction. See POSIX/POSIX::SigAction.
In reply to Re: Programmers, script languages, and Unicode
by allolex
in thread Programmers, script languages, and Unicode
by dbwiz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |