Re^4: Locale and Unicode, enemies in perl?

Replies are listed 'Best First'.
Re^5: Locale and Unicode, enemies in perl? by tchrist (Pilgrim) on Apr 14, 2011 at 17:10 UTC
Note also that often, one doesn't have the choice between Unicode and locales. As a glue language, many Perl programs are written that just have to deal with data produced by other programs - and its format is given. Another reason why Perl should keep supporting locales. Could you please explain what you mean by that? There is a super-huge difference between supporting locales for I/O layers and expecting Perl to subvert its entire internal character representation scheme of Unicode. The former makes sound sense; the latter, does not. It’s one thing to be able to handle input and output in some particular locale-dependent encoding — say `hr_HR.ISO8859-2`, `zh_HK.Big5HKSCS`, `ru_RU.koi8r`, or `sv_SE.ISO8859-15`. However, it’s quite another to demand that Perl support a completely different scheme for how it internally stores and handles its own characters. That really is not reasonable. Render unto Caesar the things that are Caesar’s and all that: the outside world does not get to impose its own provincial ideas on how Perl stores and handles its own characters! Nobody should expect Perl to store the characters in its own memory using some ancient and antiquated Microsoft byte encoding, let alone follow its silly rules. The sole reason the locale facility even exists in the first place is because it happened to enter Perl before Unicode did. It should be considered nothing more than a tiny corner in which a niche legacy continues to be supported, kinda-sorta and rather limply, for no other reason than so pre-existing Perl programs might continue to work in legacy mode without requiring any updates. System locales are a terrible pain, and a great way to write code that is at best guaranteed to be completely anti-portable. The sooner people upgrade from their legacy codesets, the better. Note that I am specifically referring here to such matters as LC_CTYPE and LC_COLLATE. I am not talking about things like money or dates.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^5: Locale and Unicode, enemies in perl?
by tchrist (Pilgrim) on Apr 14, 2011 at 17:10 UTC

Note also that often, one doesn't have the choice between Unicode and locales. As a glue language, many Perl programs are written that just have to deal with data produced by other programs - and its format is given. Another reason why Perl should keep supporting locales.

There is a super-huge difference between supporting locales for I/O layers and expecting Perl to subvert its entire internal character representation scheme of Unicode. The former makes sound sense; the latter, does not.

It’s one thing to be able to handle input and output in some particular locale-dependent encoding — say hr_HR.ISO8859-2, zh_HK.Big5HKSCS, ru_RU.koi8r, or sv_SE.ISO8859-15.

However, it’s quite another to demand that Perl support a completely different scheme for how it internally stores and handles its own characters. That really is not reasonable. Render unto Caesar the things that are Caesar’s and all that: the outside world does not get to impose its own provincial ideas on how Perl stores and handles its own characters! Nobody should expect Perl to store the characters in its own memory using some ancient and antiquated Microsoft byte encoding, let alone follow its silly rules.

The sole reason the locale facility even exists in the first place is because it happened to enter Perl before Unicode did. It should be considered nothing more than a tiny corner in which a niche legacy continues to be supported, kinda-sorta and rather limply, for no other reason than so pre-existing Perl programs might continue to work in legacy mode without requiring any updates.

System locales are a terrible pain, and a great way to write code that is at best guaranteed to be completely anti-portable. The sooner people upgrade from their legacy codesets, the better.

Note that I am specifically referring here to such matters as LC_CTYPE and LC_COLLATE. I am not talking about things like money or dates.

[reply]
[d/l]
[select]