You should be using a very recent version of Perl (5.8.1 or later), and you should look up the man pages "perluniintro", "perlunicode" and the "Encode" module.
To read unicode data properly from a file or STDIN, (and to write it properly to a file or to STDOUT), the easiest way is to use the appropriate character-encoding "IO Layer" (see docs for PerlIO). Something like this:
Apart from that, the code you posted has a lot of unnecessary effort and temporary storage. You read into a "prehasha" array, then copy that into "char" and "pinyin" arrays, then finally try to put that into a hash (though it's not clear that you succeed there). You could just read into the hash. Following on the snippet above (where the file is being read as unicode data):my $open_mode = "<:utf8"; # or "<:encoding(UTF-16LE)" etc. open( IN, $open_mode, "unicode_input.txt" ); binmode( STDOUT, ":utf8" ); # or whatever form of unicode is supported + by your display tool while (<IN>) { # data will be read as (or converted to) utf8 on input # do stuff with $_, then print; }
(updated last snippet to reflect the OP's use of tab-delimited data)my %charpinyin; while (<IN>) { chomp; my ( $chchar, $pinyin ) = split /\t/; $charpinyin{$chchar} = $pinyin; }
In reply to Re: Reading unicode characters from file
by graff
in thread Reading unicode characters from file
by Anonymous Monk
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |