in reply to Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction

I am only concerned right now about getting a clue about how to convert things like "\x{be}" into things like 0.75.

You mean like this?


use 5.14.0;
use strict;
use warnings;
use charnames qw/:full/;
use Unicode::UCD 0.32 qw/num/;

my @cp = (
    "bogus",
    "\N{DIGIT FOUR}\N{DIGIT TWO}",
    "\N{VULGAR FRACTION THREE QUARTERS}",
    "\N{VULGAR FRACTION TWO THIRDS}",
    "\N{VULGAR FRACTION ONE SEVENTH}",
    "\N{VULGAR FRACTION SEVEN EIGHTHS}",
    "\N{SUPERSCRIPT THREE}",
    "\N{SUBSCRIPT EIGHT}",
    "\N{FULLWIDTH DIGIT TWO}\N{FULLWIDTH DIGIT FIVE}",
    "\N{ROMAN NUMERAL EIGHT}",
    "\N{ROMAN NUMERAL ONE HUNDRED THOUSAND}",
    "\N{BENGALI DIGIT FOUR}\N{BENGALI DIGIT SEVEN}\N{BENGALI DIGIT FIVE}\N{BENGALI DIGIT SIX}",
    "\N{RUMI NUMBER SEVEN HUNDRED}",
    "\N{AEGEAN NUMBER NINETY THOUSAND}",
    "\N{ORIYA FRACTION THREE SIXTEENTHS}",
    "\N{TIBETAN DIGIT HALF ZERO}",
    "\N{TIBETAN DIGIT HALF ONE}",
    "\N{TIBETAN DIGIT HALF SEVEN}",
    "\N{BENGALI CURRENCY NUMERATOR ONE LESS THAN THE DENOMINATOR}",
    "\N{GREEK ACROPHONIC ATTIC FIFTY THOUSAND STATERS}",
);

for my $cp (@cp) {
    printf "%s\t= %20s\tU+%vX\n", $cp, num($cp) // "NaN", $cp;
}
__END__
bogus   =                  NaN  U+62.6F.67.75.73
42      =                   42  U+34.32
¾       =                 0.75  U+BE
⅔       =    0.666666666666667  U+2154
⅐       =    0.142857142857143  U+2150
⅞       =                0.875  U+215E
³       =                    3  U+B3
₈       =                    8  U+2088
25    =                   25  U+FF12.FF15
Ⅷ       =                    8  U+2167
ↈ       =               100000  U+2188
৪৭৫৬    =                 4756  U+9EA.9ED.9EB.9EC
𐹸       =                  700  U+10E78
𐄳       =                90000  U+10133
୷       =               0.1875  U+B77
༳       =                 -0.5  U+F33
༪       =                  0.5  U+F2A
༰       =                  6.5  U+F30
৸       =                 0.75  U+9F8
𐅖       =                50000  U+10156
  • Comment on Re: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
  • Download Code

Replies are listed 'Best First'.
Re^2: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
by educated_foo (Vicar) on Apr 10, 2011 at 04:59 UTC
    ↈ = 100000 U+2188
    ৪৭৫৬ = 4756 U+9EA.9ED.9EB.9EC
    𐹸 = 700 U+10E78
    This is why Unicode is so great: "four hollow boxes" apparently means 4756, and "one hollow box" means 700, except when it means 1e5. BTW, this is in the latest version of Safari, which seems to make a real effort to do Unicode. It's hard to implement a standard that tries to do text, pictographs, and a bit of typesetting.

      I discovered Symbola through tchrist here and it allows pretty much everything to render.

      You see boxes (or worse, the wrong glyph) if you don't have appropriate fonts for a charset, whether that charset is Unicode or not. Your problem has nothing to do with Unicode.
Re^2: Python unicodedata equivalent, or, how to convert unicode fraction to a usable fraction
by parv (Parson) on Jul 13, 2011 at 21:12 UTC

    Thanks, tchrist, for pointing out &Unicode::UCD::num.

    (In retrospect I would have mentioned in OP that I was concerned only about the vulgar-ity of Unicode.)