Hello monks. I humbly seek your wisdom.

I hit a major snag in writing a script to manipulate Chinese characters, so I thought I would back up and ensure that I know exactly how to do various things with Unicode. In the script below, Test #3 (writing to Excel) and Test #2 (writing to FILE) pass (display glyphs correctly). However, Test #1 (writing to STDOUT) fails (all the Unicode values are gibberish/wrong code page/whatever). I am on Windows 8, using latest Perl. I have switched my command line to Lucida Console font (and I tested that Lucida does, indeed, have the glyphs in my test arrays - I pasted them into Excel and switched the font to Lucida and they rendered). So, my problem remains: how to get these characters to display at the command line? You can see that I tried to use encode on one, but that didn't work either.

Script below.

Ah. Note on the script: characters in PINYIN_TESTS and HANZI_TESTS are actually in Unicode in the script

use Excel::Writer::XLSX; use Data::Table; use List::Util; use Encode; # needed for encode and decode use utf8; # needed to get Unicode working in the script itself use feature 'unicode_strings'; # not entirely sure what this does. binmode(STDOUT, ":utf8"); # needed to printf UTF-8, however you must s +et the right font (Lucida) on the cmd window as well my @PINYIN_TESTS = ("&#257;","&#275;","&#299;","&#333;","&#363;","&#47 +0;","á","é","í","ó","ú","&#472;","&#462;","&#283;","&#464;","&#466;", +"&#468;","&#474;","à","è","ì","ò","ù","&#476;",); # use utf8 needed f +or this to work my @HANZI_TESTS = ("&#22823;","&#23567;","&#24456;","&#29399;","&#3953 +2;","&#40479;"); # use utf8 needed for this to work my @HANZI_TESTS_DEC = (22823, 23567, 24456, 29399, 39532, 40479); my @HANZI_TESTS_HEX = ( "U+5927", "U+5C0F", "U+5F88", "U+72D7", "U+9A6 +C", "U+9E1F"); my @HANZI_TESTS_HEXX = ( "\x{5927}", "\x{5C0F}", "\x{5F88}", "\x{72D7} +", "\x{9A6C}", "\x{9E1F}"); my @HANZI_TESTS_HEXN = ( "\N{U+5927}", "\N{U+5C0F}", "\N{U+5F88}", "\N +{U+72D7}", "\N{U+9A6C}", "\N{U+9E1F}"); my $fileout = "unicode_testing.txt"; # WRITING HARDCODED VALUES # TEST 1 - WRITE A HARDCODED UNICODE VALUE TO STDOUT print "TESTING PINYIN\n"; map { print encode('UTF-8', $_) } @PINYIN_TESTS; print "\n"; print "TESTING HANZI\n"; map { print $_ } @HANZI_TESTS; print "\n"; print "TESTING HEXX\n"; map { print $_ } @HANZI_TESTS_HEXX; print "\n"; print "TESTING HEXN\n"; map { print $_ } @HANZI_TESTS_HEXN; print "\n"; # TEST 2 - WRITE A HARDCODED UNICODE VALUE TO A FILE open (FILE, '>' . "$fileout"); print FILE "FILE TESTING PINYIN\n"; binmode(FILE, ":utf8"); # needed to printf UTF-8 map { print FILE $_ } @PINYIN_TESTS; print FILE "\n"; print FILE "\nFILE TESTING HANZI\n"; map { print FILE $_ } @HANZI_TESTS; print FILE "\n"; print FILE "\nFILE TESTING HEXX\n"; map { print FILE $_ } @HANZI_TESTS_HEXX; print FILE "\n"; print FILE "\nFILE TESTING HEXN\n"; map { print FILE $_ } @HANZI_TESTS_HEXN; print FILE "\n"; # TEST 3 - WRITE A HARDCODED UNICODE VALUE TO A FILE USING EXCEL my $workbook = Excel::Writer::XLSX->new( 'unicode_testing.xlsx' ); $worksheet = $workbook->add_worksheet(); for (my $i=0; $i<=$#PINYIN_TESTS; $i++) { my $row = $i + 1; $worksheet->write( "A" . $row, $PINYIN_TESTS[$i] ); } for (my $i=0; $i<=$#HANZI_TESTS; $i++) { my $row = $i + 1; $worksheet->write( "B" . $row , $HANZI_TESTS[$i] ); } for (my $i=0; $i<=$#HANZI_TESTS_HEXX; $i++) { my $row = $i + 1; $worksheet->write( "C" . $row , $HANZI_TESTS[$i] ); } for (my $i=0; $i<=$#HANZI_TESTS_HEXN; $i++) { my $row = $i + 1; $worksheet->write( "D" . $row , $HANZI_TESTS[$i] ); } $workbook->close(); # TESTS TO DO # READING VALUES FROM FILE, FROM STDIN # MANIPULATING VALUES WITH REGEX

In reply to Displaying unicode chars at command line by poivre

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.