Re: help needed in unicode displaying

I'm only guessing here, but if you have a Chinese-enabled version of ms-windows, it might be the case that the dos-prompt window uses cp936 (Extended GB) for Chinese characters, or whatever cp??? applies to Big5 (Traditional) Chinese, rather than using unicode.

For that matter, if the dos-prompt window is "unicode-enabled", you might need to use UTF-16LE rather than utf8. You'd have to see whether the so-called "Help" or alleged documentation for that OS can give you any guidance on whether the dos-prompt window supports Chinese characters at all, and if so, what specific encoding is expected.

Assuming it is possible, and you can find out what character set to use, Encode and PerlIO are your friends -- you can create a perl-internal utf8 string like this:

my $utf8 = join( "", map { chr() } ( 36127, 25285, 36807, 37325 ));
[download]

and then either use Encode::encode() to convert it to something besides utf8 (if necessary), or simply use binmode STDOUT, ":encoding(cp936)"; (use alternate character encoding name as needed) so that perl converts the string into the expected character set on output (see perlunicode and perluniintro).

(If it turns out that the dos-prompt window wants utf8 data, just do binmode STDOUT, ":utf8"; so that perl knows you want output utf8 data.)

(updated to fix missing close-paren in code snippet)

Comment on Re: help needed in unicode displaying Select or Download Code

Replies are listed 'Best First'.
Re^2: help needed in unicode displaying by singam (Initiate) on Mar 10, 2006 at 11:56 UTC
dear monks, i saw the posts in "help needed in unicode displaying" and i tried to do some program.i came across the following problem.In the $target variable, I stored some chinese characters (for example three chinese characters), I tried to get first two chinese characters using `substr $target,0 2 ;` , but its not giving the appropriate answer. any one give me the solution to retrieve the first two chinese characters,	[reply] [d/l]
Re^3: help needed in unicode displaying by graff (Chancellor) on Mar 10, 2006 at 23:12 UTC
How about if you show us just enough code to demonstrate the problem -- i.e. assign characters to a scalar, apply substr to the scalar, print the result in some way -- and show us what you actually get as a result. (In the little snippet you showed, there should be a comma between the 0 and the 2.) For example, this should do what you intended: `my $target = join("", map {chr()} ( 0x5434, 0x9547, 0x5b87 )); my $part = substr( $target, 0, 2 ); print " length of target = ", length($target); print "\n length of substr = ", length( $part ); print "\n $target\n $part\n";` [download] The length function should return the character counts (3 for $target, 2 for $part). If you are using a utf8-aware display window, you should see the two strings in Chinese characters (or redirect to a file, and view that in a utf8-capable display tool, like a browser). If you get something different, tell us what OS and perl version you have, and be specific about what you actually got.	[reply] [d/l]