in reply to Re^5: Unicode vulgar fraction composition
in thread Unicode vulgar fraction composition
My mistake.
Then yeah, one could possibly argue that this should be have been a standard decomposition rather than a compatibility decomposition. But they'd be wrong.
A program is free to switch between the NFC and the NFD of a string at any time. As such, they should be visually and semantically indistinguishable. In other words, the two forms are simply two different ways of encoding graphemes internally.
Grapheme. (1) A minimally distinctive unit of writing in the context of a particular writing system. For example, ‹b› and ‹d› are distinct graphemes in English writing systems because there exist distinct words like big and dig. Conversely, a lowercase italiform letter a and a lowercase Roman letter a are not distinct graphemes because no word is distinguished on the basis of these two different forms. (2) What a user thinks of as a character.
"7/8" isn't a grapheme[1], much less the same one as "⅞". As such, the two strings could have different appearances or meanings, and it's easy to come up with an example where someone might intentionally use "7/8" over "⅞". Imagine a document containing "... between 7/8 and 15/16 of the ...". The author might purposefully not use "⅞" for stylistic consistency. It would not be proper for a program to automatically convert "7/8" to "⅞" wherever it occurs.
The short version is that noone can guess what transformations you want to perform, so it's up to you to determine the rules you want to follow, which is to say write a program that does what you want. Do you want to change "7/8" into "⅞" unconditionally? conditionally? What about LATIN CAPITAL LETTER A WITH RING ABOVE (Å). Is there a time it should become ANGSTROM SIGN (Å)? etc These are decisions for you to take.
Note I used a normal slash instead of a FRACTION SLASH throughout this post to avoid confusion because my browser rendered fractions with a FRACTION SLASH much like "⅞", and yours might to. But it is under no obligation to do so, and other renders won't do this.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^7: Unicode vulgar fraction composition
by raygun (Scribe) on Oct 09, 2020 at 05:18 UTC | |
by ikegami (Patriarch) on Oct 11, 2020 at 05:27 UTC |