in reply to tr{}{} doesn't wanna work.. what am I doing wrong?

Why do you wish to commit irreversible injury to your data?

Where is the eye of the needle in your data-processing that will allow nothing but teh very most primitive of pre-electric typewriter keys through its tiny aperture?

Why aren’t you using UAX#44, UAX#15, UTS#10, and UTS#35 to guide you in this?


Chapter 9: The State of the Art

The state of the art has more by far to do with the knowledge and skill of its practitioners than with the subtleties of their tools, but tools can constrain that skill or set it free. The limitations of the tools are therefore also of some interest. They are of special interest now, because they are subject to rapid change.

9.1 The Hundred‐Thousand Character Alphabet

It is often said that the Latin alphabet consists of 26 letters, the Greek of 24 and the Arabic of 28. If you confine yourself to one case only, a narrow historical window and the dialect in power, this assertion can hold true. If you include both caps and lower case, accented letters and a global set of consonants and vowels — á à â å ã ä ą ă ā æ ǽ ç ć č ð đ é ł ñ ň ņ ő š ș þ ű ū ŵ ý ž ź ż and all the rest — the Latin alphabet is not 26 letters long after all; it is closer to 600 and able to increase at any time. The alphabet that classicists now use for classical Greek, with its long parade of vowels and diacritics — ά ὰ ᾶ ἀ ἃ ἅ ἆ ἁ ἅ ἃ ἇ ᾷ ᾇ, and so on — is modest by comparison: fewer than 300 glyphs altogether.

         To the 600‐character globalized Latin alphabet, mathematicians, grammarians, chemists, and even typographers are prone to make additions: arabic numerals, punctuation, technical symbols, letters borrowed from Hebrew, Greek, and Cyrillic, and, where the letterforms require or invite them, a few typographic ligatures and alternates as well. There is no hope at this stage of counting the number of sorts or glyphs precisely, but the total is clearly over a thousand.

         At the end of the eighteenth century, an English‐speaking hand compositor’s standard lower case had 54 compartments, holding roman or italic a to z, arabic numerals, basic ligatures, spaces, and punctuation. The upper case had another 98, containing caps and analphabetics. That total, 98 + 54 + 152, is the English‐speaking hand compositor’s minimum basic allotment. When more sorts are required, as they very often are, supplementary cases are used. Two pair gave 304 compartments; three pair give 456; four pair gave 608. How Gutenberg’s cases were arranged we do not know, but we know how big they were. He used not 26 but 290 different sorts, in one face and one size, in an unaccented script, to set his 42‐line Bible. The Monotype machine, built five centuries later, with 255 (later 272) positions in a standard matrix case, had fallen only a little ways behind.

         Early computers and e‐mail links were, by comparison, living in typographic poverty. The alphabet they used was the basic character set defined by the American Standard Code for Information Interchange, or ASCII. Each character was limited to seven bits of binary information, so the maximum number of characters was 2⁷ = 128. Thirty‐three of those were normally subtracted for control codes, and one was the code for an empty space. This leaves 94: not even enough to hold the standard working character set of Spanish, French, or German. The fact that such a character set was long considered adequate tells us something about the cultural narrowness of American civilization, or American technocracy, in the midst of twentieth century.

         The extended ASCII character set, which has been in general use since 1980, is made from eight‐bit characters. This gives 2⁸ = 256 slots altogether. As a rule, glyphs are assigned to some 230 of these. Editing and composition software often limits the working selection to 216 or less. The upper register of this set — altogether invisible on a normal computer keyboard — is usually filled out with characters selected from the Latin 1 Character Set established by ISO (the International Organization for Standarization, Geneva). These characters — ä ç é ñ and so on — are identified and discussed in appendix B, page 301.

         The allotment of 216 or 230 characters is meagre but adequate for basic communication in all the ‘official’ languages of Western Europe and North America. This ignores the needs of mathematicians, linguists, and other specialists, and of millions of normal human beings who use the Latin alphabet for Czech, Hausa, Hungarian, Latvian, Navajo, Polish, Romanian, Turkish, Vietnamese, Welsh, Yoruba, and so on. The extended ASCII character set is the alphabet not of the real world nor of the UN General Assembly but of NATO: a technological memento of the them‐and‐us mentality that thrived in the Cold War.

         Good, affordable software that would handle thousands of characters efficiently was for sale (and in fact was widely used) in the early 1980s. Standardization within the industry shrank this palette down, then enormously increased it. Some typographic tools have not caught up. Typographically sectarian and culturally stunted software is widespread.

         Earlier typographers were free to cut another punch at any type and cast another character. The freedom to do likewise exists with the computer. But finding room for all these letters in a shared standard alphabet involves, in the digital world, a shift from eight‐bit to sixteen‐bit characters. When we make this change, the alphabet increases to 2¹⁶ = 65,536 characters. The first version of a standard set of characters this size — known as Unicode — was roughed out at the end of the 1980s and published in the early 1990s. By the year 2000, the rudiments of Unicode were embedded in the operating systems of home computers, and major digital founders had adopted it as the new encoding standard.

         It is, like any standard, less than perfect, but it forms a working protocol both for a global Latin alphabet and for the technological coexistence of Arabic, Bengali, Chinese, Cyrillic, Devanagari, Greek, Hebrew, Korean, Latin, Thai, Tibetan and hundreds of other scripts. It was soon clear, however, that 65,000 characters wasn’t enough. To extend the set, 2¹⁰ = 2,048 of the original allotment were assigned to function in pairs. This permits an additional 1024² = 1,048,576 characters. In its latest published form (version 5.0, with the 5.1.0 additions, issued in 2007), Unicode defines 100,507 characters, sets 137,468 aside for private use, and still has roughly 874,000 free for future allocation.

         Few of us may need (and few may want to memorize) 100,000 characters. Typographers working Chinese have often mastered 20,000; those who work in Korean learn 3,000 or more; most literate humans learn a thousand characters or fewer. Yet authors, editors, typographers, and ordinary citizens who just want to be able to spell Dvořák, Miłosz, Mą’ii, or al‐Fārābī, or to quote a line of Sophocles or Pushkin, or the Vedas or the Sutras or the Psalms, or to write φ ≠ π, are beneficiaries of a system this inclusive. So is everyone who want to read their e‐mail in an alphabet other than Latin or a language other than English.

         There may also never be a font of 100,000 well‐made characters designed by one designer. But good fonts with well over ten thousand characters, keyed to the Unicode system, are now readily available. Computer operating systems now support them. More importantly, fonts for particular symbol sets and alphabets can be linked and tuned to one another by adjusting weight, letterfit and scale. This kind of typographic diplomacy is a task of some importance — and when character sets are joined in this way, sharing typographic space whether or not they are all on one font, Unicode can serve as a coordinating mechanism.

         Unicode is relatively new, but many of the resources it catalogues are ancient. Composition software, communication links, and keyboards are just starting to catch up.


pages 179–182 of "The Elements of Typographic Style", version 3.2
Copyright © 1992, 1996, 2004, 2005, 2008 by Robert Bringhurst
Hartley & Marks Publishers (2008)
ISBN: 0-88179-205-5

Designed & typeset in Canada;
printed & bound in China.


Are you completely certain than you have no choice but to turn back the calendar by fifty years and more?

Because if you are, then you are going about this wrong. And if you aren’t, you shouldn’t be doing it at all.

  • Comment on Re: tr{}{} doesn't wanna work.. what am I doing wrong?