in reply to Re^2: [RFC] How to reverse a (text) string
in thread [RFC] How to reverse a (text) string
The term dernormalization doesn't occur in either the normalization FAQ nor in the Unicode Normalization Forms report.
Isn't just one of the ways to encode the graphemes supposed to be the "normal form"?
But which one? There are good reasons for all of these forms.
In either case thanks both for uncovering an (what should I call it?!?) interesting feature of Unicode. I had no idea characters in Unicode can be not only multibyte, but also mutlicodepoint. I guess the commitee that invented Unicode was too big.
It's not the committee size, but rather the number of possible graphemes in all languages of the world. If Unicode had Codepoints larger than 2^32 you wouldn't be happy either, would you?
And I think it is a quite natural approach to divide a grapheme into a base character and a decoration.
It's sad that it makes programming harder, but if you oversimplify, you lose correctness.
Sadly Perl 5's builtins don't work on the grapheme level, only Codepoint and Byte level. It's one of the many reasons why I'm looking forward to Perl 6...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: [RFC] How to reverse a (text) string
by Jenda (Abbot) on Dec 19, 2007 at 23:28 UTC | |
by graff (Chancellor) on Dec 20, 2007 at 04:10 UTC |