The combinatorial problem might seem relatively trivial for this or that language taken on its own, but it can get pretty cumbersome when each of the many syllabic-based scripts requires a few thousand combined forms, built from less than a hundred basic components. And there are actually quite a few text-processing applications where it really helps to have the graphemes expressed in terms of their individual components, because each component tends to have a stable linguistic function or "meaning" in the structure of the language.
(I'm thinking about how Korean is handled -- even when you put aside their use of Chinese ideographs, they still use a lot of code points. Applying that approach to Hebrew, Arabic, Hindi, Bengali, Tibetan, Tamil, and several others is, frankly, not an attractive prospect, IMHO.)
I guess the point is: things will have to be complicated one way or another. If you try to simplify in one area, you end up making things more complicated elsewhere, and vice versa. The existing approach of using combining marks has some nice advantages, and its disadvantages are made a bit less painful by the presence of the Unicode Character Database (this thing is included with perl 5.8 distributions), which lets you look up any code point to see whether its a "letter" or a "combining mark" (or a "number", or "punctuation" or "bracket" or...)
In reply to Re^5: [RFC] How to reverse a (text) string
by graff
in thread [RFC] How to reverse a (text) string
by moritz
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |