in reply to Re: OSCON Perl Unicode Slides
in thread OSCON Perl Unicode Slides
The ASCII alphabet (excluding non-alpha) has the alpha characters of 'a' .. 'z', 'A' .. 'Z'. There is the classic example of Jalapeno (which is written stupidly) versus Jalapeño, which is written correctly but doesn't fit into ASCII. This opens a big debate as to whether Jalapeño is an English word, and the short answer is that any word commonly used in English constitutes an English word (at least that's what many people assert).
Who am I to say that Björn Gunnlaugsson should change his name to Bjorn when he purchases a wallet at JC Penny? Yet that's what happens when the guy who programs POS terminals doesn't consider names that contain non-ASCII characters.
How dumb does it look to type '/' when we mean to use the obelus (÷) symbol? That isn't included in ASCII. Neither is the cent symbol (¢). Sure, we have our dumb workarounds like '/' and $0.01, and you might argue they're not strictly English, but they are what makes a document appear well edited versus typed by some guy at 2am on the Internewebs.
Another example is found in the Wikipedia Æ entry:In English, usage of the ligature varies in different places. In modern typography, and where technological limitations make its use difficult (such as in use of typewriters), æ is often eschewed in favor of the digraph ae. This is often considered incorrect especially when rendering foreign words where æ is considered a letter (e.g. Æsir, Ærø) or brand names which make use of the ligature (e.g. Æon Flux, Encyclopædia Britannica). In the United States, the problem of the ligature is sidestepped in many cases by use of a simplified spelling with "e"; compare the common usage, medieval, with the traditional mediæval. However, given the long history of such spellings, they are sometimes used to invoke archaism or in literal quotations of historic sources; for instance, words such as dæmon are often treated in this way. Often, it will be replaced with a simple "ae" as in archaeology.
Update: I should have expected this to escalate, so please let me try to douse the fire. This post was an attempt (however inadequate it may have been) to explain the arguments that I have seen tchrist present to support his assertion that was the basis for Tanktalus's question. Whether the quality of my examples rose to the level that he might have presented is an area I will admit shortcoming. And I didn't even intend to start an argument as to the merits of his assertion either. I was just trying to give a few examples of what he's talking about. Though I tend to agree with the principle that ASCII text is only a subset of the characters needed to gracefully express a language, it's an endless and pointless debate. Pointless because Unicode is here to address the issues, and endless because Unicode isn't going away.
Dave
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: OSCON Perl Unicode Slides
by Tanktalus (Canon) on Jul 25, 2011 at 19:29 UTC | |
by jdporter (Paladin) on Jul 25, 2011 at 20:32 UTC | |
|
Re^3: OSCON Perl Unicode Slides
by BrowserUk (Patriarch) on Jul 25, 2011 at 19:52 UTC | |
by Corion (Patriarch) on Jul 25, 2011 at 20:06 UTC | |
by BrowserUk (Patriarch) on Jul 25, 2011 at 20:20 UTC | |
by Corion (Patriarch) on Jul 25, 2011 at 20:25 UTC | |
by BrowserUk (Patriarch) on Jul 25, 2011 at 20:40 UTC | |
| |
by AnomalousMonk (Archbishop) on Jul 25, 2011 at 21:36 UTC |