C is An unsigned char (octet,8bit) value.
W An unsigned char value (can be greater than 255).
So, why "C" values could become greater than 255?
#unpack "C*", $unicode_string.$unicode_string;
#("UNSIGNED OCTETS(C*) ", 12354, 12354)
this seems strange...
Do you mean my example should use "W" for unpack? If so, Does this make sense? The result is same with my machine. My point is, @bytes is not bytes, it is decimal code points for "HIRAGANA LETTER A".
I really should read packtut.$code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); @bytes=unpack("W*", $unicode_string); print join('|', @bytes), "\n"; #==>these are not bytes ,but array + of codepoints $code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); @bytes=map{ sprintf("%X",$_) } unpack("W*", Encode::encode('utf8', +$unicode_string)); print join('|', @bytes), "\n";
update:
I met description of perlunicode:
" pack("C") and unpack("C") are methods for emulating byte-oriented chr() and ord() on Unicode strings. While these methods reveal the internal encoding of Unicode strings, that is not something one normally needs to care about at all."
so, I think
doesn't it ?# this is wrong @bytes=unpack("C*", $unicode_string); # this is right @byets= unpack("C*", Encode::encode('utf8',$unicode_string));
In reply to Re^4: Example of perluniintro
by remiah
in thread Example of perluniintro
by remiah
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |