Handling Traditional Chinese Characters

dhinesh has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Handling Traditional Chinese Characters by ikegami (Patriarch) on Jun 18, 2009 at 18:15 UTC
Please let me know where I'm going wrong. You seem to have forgotten to give us something from which we could identify something wrong. My first test would be to use Devel::Peek's `Dump` to check if I get what I need from Excel. Please provide the Dump of a variable that should contain Chinese chars.	[reply] [d/l]
Re: Handling Traditional Chinese Characters by afoken (Chancellor) on Jun 18, 2009 at 18:25 UTC
Are you using strict? Did you enable warnings? Did you tell perl how to write Unicode characters to the text file? Does your text file viewer know how (i.e. in which encoding) perl wrote the unicode characters to the file? What "excel convertor code from CPAN" did you use? (Tell us the URL!) And by the way: Show us the code, wrapped in CODE-tags. Alexander -- Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)	[reply]
Re: Handling Traditional Chinese Characters by derby (Abbot) on Jun 18, 2009 at 18:32 UTC
Without any code to review, I would assume that the chinese characters are in UCS2 (UCS2-2BE) in excel and you're treating them like UTF-8 (or yikes ISO-8859-1) when outputting. Can you post just those snippets of the code that read the excel cell values? -derby	[reply]
Re: Handling Traditional Chinese Characters by Polyglot (Chaplain) on Jun 18, 2009 at 23:48 UTC
From experience I can tell you that not every CPAN module is capable of properly handling the Asian languages, especially Chinese, Japanese, and Korean (CJK). I would guess, in fact, that the majority of them are not compatible with these languages. I have frequently had to write my own code to deal with them because of this. Here are some tips on ways to deal with everything in UTF8: use Encode; use Encode qw(encode decode); binmode STDOUT, ':utf8'; print "Content-type: text/html; charset=utf-8\n\n"; open SOURCE, '<:encoding(utf8)',$sourcefile or die "Cannot open source! $!\n"; open (TARGET, ">:encoding(utf8)", "$targetfile") or die "Cannot open target file! $!\n"; print TARGET <<HTML; <html lang="utf8"> <head> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf8"> ... <form name="myform" method="POST" accept-encoding="UTF-8" accept-chars +et="utf-8" action="$thisprogram"> ... HTML foreach $line (@source) { $line = decode("utf-8", $line); [download] Note that you may not need to do all of these at once. For example, if you already read the file in as UTF8, there is no need to decode each line of the file as UTF8 again. However, redundancy should have no side effects other than adding a little more bulk to your code. Blessings, ~ Polyglot ~	[reply] [d/l]