in reply to Re^3: Example of perluniintro
in thread Example of perluniintro
C is An unsigned char (octet,8bit) value.
W An unsigned char value (can be greater than 255).
So, why "C" values could become greater than 255?
#unpack "C*", $unicode_string.$unicode_string;
#("UNSIGNED OCTETS(C*) ", 12354, 12354)
this seems strange...
Do you mean my example should use "W" for unpack? If so, Does this make sense? The result is same with my machine. My point is, @bytes is not bytes, it is decimal code points for "HIRAGANA LETTER A".
I really should read packtut.$code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); @bytes=unpack("W*", $unicode_string); print join('|', @bytes), "\n"; #==>these are not bytes ,but array + of codepoints $code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); @bytes=map{ sprintf("%X",$_) } unpack("W*", Encode::encode('utf8', +$unicode_string)); print join('|', @bytes), "\n";
update:
I met description of perlunicode:
" pack("C") and unpack("C") are methods for emulating byte-oriented chr() and ord() on Unicode strings. While these methods reveal the internal encoding of Unicode strings, that is not something one normally needs to care about at all."
so, I think
doesn't it ?# this is wrong @bytes=unpack("C*", $unicode_string); # this is right @byets= unpack("C*", Encode::encode('utf8',$unicode_string));
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^5: Example of perluniintro
by Anonymous Monk on Aug 18, 2012 at 07:22 UTC | |
by remiah (Hermit) on Aug 18, 2012 at 08:09 UTC |