That's a start on what you need. As usual, the Devil is in the Details. Even if you solve the problem of encoding all human written communication into a bit stream, you still need to process that data. A few things you need to cover are:
- How do you do basic text transformations, like lc and uc? (Does the language in question even have a concept of upper/lowercase chars?) Should 'í' sort before or after 'i'? (These get harder to implement when you consider combining chars--which is why the Unicode-enabled perl 5.8 runs so much slower than eariler versions).
- How do you store the glyphs? Assuming a 32-bit char set and each glyph taking up a 16x16 bits in a matrix (black-and-white, no anti-aliasing or other fancy stuff), you would need 2**32 * 2 * 2 = 129 GB to store all possible glyphs. You need a way to break this up so that systems only have to deal with the subset of the data that the typical user cares about. (And I'm not sure that 16x16 bytes would be big enough for many characters).
- What do you do about old code that does things like s/[A-Z]/[a-z]/g;? ("Ignore it" is often a good answer.)
Also, IIRC, even with combining chars, 64-bits still isn't enough to cover all human language.
I hypothisize that computers would have never become so ubiquitous in people's lives if they were first developed in a region with a complex alphabet. The problems of developing the user interface would have been so big that I bet most people would give up and say "well, let's just use them to crunch numbers". We might be fortunate that computers were primarily developed in a region with a relatively simple alphabet.
"There is no shame in being self-taught, only in not trying to learn in the first place." -- Atrus, Myst: The Book of D'ni.
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.