Firstly, it seems I gave you something of a bum steer regarding &Unicode::UCD::charprops_all. I've been using it for a while and forgot that it was a fairly recent function (in terms of Perl versions). It was added in v5.22.0, along with some other functions, so won't be available on any of the Perl versions you're using. Sorry about that. See perl5220delta: Updated Modules and Pragmata.

[Just a quick note on markup. While it's generally preferable to use '<code>' tags for code and data, which you've been doing, this doesn't work too well with Unicode characters (outside the ASCII range). In these cases, '<pre>' tags work better: for instance, you'll see 'α' instead of '&#945;'. For inline text, such as in paragraphs, '<tt>' tags serve the same purpose.]

As I have neither Ubuntu nor AIX, I can't effectively reproduce your results. However, I looked into this a bit further and have a few other suggestions.

As you successfully printed the characters from the codepoints:

$ perl -C -E 'say "\x{3b1} - \x{df} - \x{a3}"'
α - ß - £

[I didn't need it, but you may need to add -Mutf8 to get rid of the "Wide character" message you're seeing.]

See what names you get for those characters:

$ perl -Mcharnames=:full -E 'say charnames::viacode($_) for (0x3b1, 0x +df, 0xa3)' GREEK SMALL LETTER ALPHA LATIN SMALL LETTER SHARP S POUND SIGN

In terms of what you're referring to as "aliases", I suspect there's a very large number of these. Have a look at charnames; in particular, read what it says about :full, :loose and :short. There's a link to the algorithm for :loose matching, but it's horribly broken: it should be "http://www.unicode.org/reports/tr44/#Matching_Names". How :short is determined, is explained on the charnames page.

:full is fairly straighforward:

$ perl -C -E 'say "\N{GREEK SMALL LETTER ALPHA}"'
α

Based on that #Matching_Names algorithm, I then tried:

$ perl -C -E 'say "\N{greek small letter alpha}"' Unknown charname 'greek small letter alpha' at -e line 1, within strin +g Execution of -e aborted due to compilation errors.

However, when I specified -Mcharnames=:loose, it worked:

$ perl -Mcharnames=:loose -C -E 'say "\N{greek small letter alpha}"'
α

Bearing in mind the :loose algorithm, you can see there's a huge number of possibilities. Here's a few examples:

$ perl -Mcharnames=:loose -C -E 'say "\N{GREEK_SMALL_LETTER_ALPHA}"'
α
$ perl -Mcharnames=:loose -C -E 'say "\N{GREEK-SMALL-LETTER-ALPHA}"'
α
$ perl -Mcharnames=:loose -C -E 'say "\N{GREEK-SMALL_LETTER-ALPHA}"'
α
$ perl -Mcharnames=:loose -C -E 'say "\N{greek_small_letter_alpha}"'
α
$ perl -Mcharnames=:loose -C -E 'say "\N{greek-small_letter-alpha}"'
α
$ perl -Mcharnames=:loose -C -E 'say "\N{greek small-letter alpha}"'
α
$ perl -Mcharnames=:loose -C -E 'say "\N{GrEeK SmAlL-LeTtEr aLpHa}"'
α

Now, as shown in my earlier post, I was able to use the :short forms directly:

$ perl -C -E 'say "\N{greek:alpha}"'
α
$ perl -Mcharnames=greek -C -E 'say "\N{alpha}"'
α

They didn't work for you, but maybe these might:

$ perl -Mcharnames=:short -C -E 'say "\N{greek:alpha}"'
α
$ perl -Mcharnames=:short,greek -C -E 'say "\N{alpha}"'
α

I also had a brief look at the source code for charnames.pm and _charnames.pm; although, I didn't delve into them too deeply. There's a lot of (non-POD) comments that may be of interest. Perhaps have a look at those for the versions you're using.

— Ken


In reply to Re^3: Unknown charnames when building Encode by kcott
in thread Unknown charnames when building Encode by yulivee07

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.