Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

G'day fireblood,

For generally troubleshooting this type of problem, you need to assess the Unicode abilities of all elements involved.

Firstly, check that the code point is a valid Unicode code point with a printable character assigned to it. Note that, although the code point may be in a valid block, i.e. a range of code points, it may not be a printable character: it may be unassigned, reserved, a control character, or similar. See the "Unicode Code Charts".

Next check Perl's capabilities. If you look in the Miscellaneous section of perldoc you'll find the perldelta pages. These will tell you which version of Unicode is supported by which version of Perl. They only tell you when a new Unicode version is supported, so that can take some hunting around: check the zero subversions (5.22.0, 5.24.0, etc.) first. For your version up to the latest:

Perl versionUnicode version supported
5.22.07.0
5.24.08.0
5.26.09.0

The Unicode::UCD module (UCD = "Unicode Character Databse") can provide you with a lot of other useful information. Here's just a few examples:

Which Unicode version does your current Perl support. I'm using Perl 5.26, so it shows Unicode 9; you're using 5.22, so it should show Unicode 7.

$ perl -E 'use Unicode::UCD; say Unicode::UCD::UnicodeVersion' 9.0.0

What version of Unicode did a character first appear in (given by the "Age" property). Here's a couple: one from your post; one I happened to know was a recent addition.

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+5C0D", "Age")' V1_1 $ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1F9C0", "Age") +' V8_0

If I switch to Perl 5.22, the output from that last command becomes:

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1F9C0", "Age") +' Unassigned

Note that, in isolation, that output is indistinguishable from a code point which isn't actually assigned; however, if you did the "valid Unicode code point" check first, as suggested, you'll know the difference.

$ perl -E 'use Unicode::UCD "charprop"; say charprop("U+1E95A", "Age") +' Unassigned

[See Unicode code charts (PDF): "Supplemental Symbols and Pictographs" for U+1F9C0 (a recently added emoji which looks like a wedge of cheese); "Adlam" for U+1E95A (no special significance: Adlam was alphabetically first when searching for a block with an unassigned code point; U+1E95A just happened to be in a noticeable gap between assigned code points.]

Next, you'll need to check the Unicode support available for your operating system, the application you're using to display the characters, fonts being used and so on. I don't have those available; however, this would (as far as I know) be valid from a Cygwin command line, and may provide some insight:

$ perl -C -E 'say "\x{5c0d}"'
對
$ echo "對"
對

Note that I used <pre> tags for that last part. When showing characters outside the ASCII range, these are a better choice than <code> tags which will often just render them as entity references (e.g. &#x5C0D;).

— Ken


In reply to Re: printing Unicode works for some characters but not all by kcott
in thread printing Unicode works for some characters but not all by fireblood

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (5)
As of 2024-04-19 22:50 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found