Erm ... if the transformation in one direction is "normalization", then the transformation in the oposite direction is denormalization, isn't it?

The term dernormalization doesn't occur in either the normalization FAQ nor in the Unicode Normalization Forms report.

Isn't just one of the ways to encode the graphemes supposed to be the "normal form"?

But which one? There are good reasons for all of these forms.

In either case thanks both for uncovering an (what should I call it?!?) interesting feature of Unicode. I had no idea characters in Unicode can be not only multibyte, but also mutlicodepoint. I guess the commitee that invented Unicode was too big.

It's not the committee size, but rather the number of possible graphemes in all languages of the world. If Unicode had Codepoints larger than 2^32 you wouldn't be happy either, would you?

And I think it is a quite natural approach to divide a grapheme into a base character and a decoration.

It's sad that it makes programming harder, but if you oversimplify, you lose correctness.

Sadly Perl 5's builtins don't work on the grapheme level, only Codepoint and Byte level. It's one of the many reasons why I'm looking forward to Perl 6...


In reply to Re^3: [RFC] How to reverse a (text) string by moritz
in thread [RFC] How to reverse a (text) string by moritz

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.