poivre has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks. I humbly seek your wisdom.

I hit a major snag in writing a script to manipulate Chinese characters, so I thought I would back up and ensure that I know exactly how to do various things with Unicode. In the script below, Test #3 (writing to Excel) and Test #2 (writing to FILE) pass (display glyphs correctly). However, Test #1 (writing to STDOUT) fails (all the Unicode values are gibberish/wrong code page/whatever). I am on Windows 8, using latest Perl. I have switched my command line to Lucida Console font (and I tested that Lucida does, indeed, have the glyphs in my test arrays - I pasted them into Excel and switched the font to Lucida and they rendered). So, my problem remains: how to get these characters to display at the command line? You can see that I tried to use encode on one, but that didn't work either.

Script below.

Ah. Note on the script: characters in PINYIN_TESTS and HANZI_TESTS are actually in Unicode in the script

use Excel::Writer::XLSX; use Data::Table; use List::Util; use Encode; # needed for encode and decode use utf8; # needed to get Unicode working in the script itself use feature 'unicode_strings'; # not entirely sure what this does. binmode(STDOUT, ":utf8"); # needed to printf UTF-8, however you must s +et the right font (Lucida) on the cmd window as well my @PINYIN_TESTS = ("&#257;","&#275;","&#299;","&#333;","&#363;","&#47 +0;","á","é","í","ó","ú","&#472;","&#462;","&#283;","&#464;","&#466;", +"&#468;","&#474;","à","è","ì","ò","ù","&#476;",); # use utf8 needed f +or this to work my @HANZI_TESTS = ("&#22823;","&#23567;","&#24456;","&#29399;","&#3953 +2;","&#40479;"); # use utf8 needed for this to work my @HANZI_TESTS_DEC = (22823, 23567, 24456, 29399, 39532, 40479); my @HANZI_TESTS_HEX = ( "U+5927", "U+5C0F", "U+5F88", "U+72D7", "U+9A6 +C", "U+9E1F"); my @HANZI_TESTS_HEXX = ( "\x{5927}", "\x{5C0F}", "\x{5F88}", "\x{72D7} +", "\x{9A6C}", "\x{9E1F}"); my @HANZI_TESTS_HEXN = ( "\N{U+5927}", "\N{U+5C0F}", "\N{U+5F88}", "\N +{U+72D7}", "\N{U+9A6C}", "\N{U+9E1F}"); my $fileout = "unicode_testing.txt"; # WRITING HARDCODED VALUES # TEST 1 - WRITE A HARDCODED UNICODE VALUE TO STDOUT print "TESTING PINYIN\n"; map { print encode('UTF-8', $_) } @PINYIN_TESTS; print "\n"; print "TESTING HANZI\n"; map { print $_ } @HANZI_TESTS; print "\n"; print "TESTING HEXX\n"; map { print $_ } @HANZI_TESTS_HEXX; print "\n"; print "TESTING HEXN\n"; map { print $_ } @HANZI_TESTS_HEXN; print "\n"; # TEST 2 - WRITE A HARDCODED UNICODE VALUE TO A FILE open (FILE, '>' . "$fileout"); print FILE "FILE TESTING PINYIN\n"; binmode(FILE, ":utf8"); # needed to printf UTF-8 map { print FILE $_ } @PINYIN_TESTS; print FILE "\n"; print FILE "\nFILE TESTING HANZI\n"; map { print FILE $_ } @HANZI_TESTS; print FILE "\n"; print FILE "\nFILE TESTING HEXX\n"; map { print FILE $_ } @HANZI_TESTS_HEXX; print FILE "\n"; print FILE "\nFILE TESTING HEXN\n"; map { print FILE $_ } @HANZI_TESTS_HEXN; print FILE "\n"; # TEST 3 - WRITE A HARDCODED UNICODE VALUE TO A FILE USING EXCEL my $workbook = Excel::Writer::XLSX->new( 'unicode_testing.xlsx' ); $worksheet = $workbook->add_worksheet(); for (my $i=0; $i<=$#PINYIN_TESTS; $i++) { my $row = $i + 1; $worksheet->write( "A" . $row, $PINYIN_TESTS[$i] ); } for (my $i=0; $i<=$#HANZI_TESTS; $i++) { my $row = $i + 1; $worksheet->write( "B" . $row , $HANZI_TESTS[$i] ); } for (my $i=0; $i<=$#HANZI_TESTS_HEXX; $i++) { my $row = $i + 1; $worksheet->write( "C" . $row , $HANZI_TESTS[$i] ); } for (my $i=0; $i<=$#HANZI_TESTS_HEXN; $i++) { my $row = $i + 1; $worksheet->write( "D" . $row , $HANZI_TESTS[$i] ); } $workbook->close(); # TESTS TO DO # READING VALUES FROM FILE, FROM STDIN # MANIPULATING VALUES WITH REGEX

Replies are listed 'Best First'.
Re: Displaying unicode chars at command line
by Anonymous Monk on Jan 31, 2013 at 08:17 UTC
Re: Displaying unicode chars at command line
by nikosv (Deacon) on Jan 31, 2013 at 12:27 UTC