I understand your sentiment, but in fairness to the folks creating Unicode, the design process involves a lot of tough calls... The number of languages that use "diacritic" combinations on basic characters is somewhat astonishing, and for the ones that had any sort of pre-existing standard for character encoding, there's the initial problem of the "inertia" from established practice (e.g. collating logic).

The combinatorial problem might seem relatively trivial for this or that language taken on its own, but it can get pretty cumbersome when each of the many syllabic-based scripts requires a few thousand combined forms, built from less than a hundred basic components. And there are actually quite a few text-processing applications where it really helps to have the graphemes expressed in terms of their individual components, because each component tends to have a stable linguistic function or "meaning" in the structure of the language.

(I'm thinking about how Korean is handled -- even when you put aside their use of Chinese ideographs, they still use a lot of code points. Applying that approach to Hebrew, Arabic, Hindi, Bengali, Tibetan, Tamil, and several others is, frankly, not an attractive prospect, IMHO.)

I guess the point is: things will have to be complicated one way or another. If you try to simplify in one area, you end up making things more complicated elsewhere, and vice versa. The existing approach of using combining marks has some nice advantages, and its disadvantages are made a bit less painful by the presence of the Unicode Character Database (this thing is included with perl 5.8 distributions), which lets you look up any code point to see whether its a "letter" or a "combining mark" (or a "number", or "punctuation" or "bracket" or...)


In reply to Re^5: [RFC] How to reverse a (text) string by graff
in thread [RFC] How to reverse a (text) string by moritz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.