remiah has asked for the wisdom of the Perl Monks concerning the following question:
"A" is 0x41 for bytes and 0x41 for code point.
"HIRAGANA LETTER A is 0xe3,0x81,0x82 for bytes and 0x3042 for codepoint.
#hex dump of A #00000000 41 |A| #00000001
#hex dump of HIRAGANA LETTER A #00000000 e3 81 82 |...| #00000003
And two example codes below.
Devel::Peek shows $native_string is UTF8 flagged and $native_string2 is not UTF-8 flagged in case of HIRAGANA LETTER A.#Example 1: native string may not be native string #Code: $native_string=pack('W*', unpack('U*', $unicode_string)); use strict; use warnings; use Encode qw(encode); use Devel::Peek; use 5.012; my($code_point,$unicode_string,$native_string, $native_string2); $code_point=0x41;#"A"; $unicode_string=pack('U*', $code_point); $native_string=pack('W*', unpack('U*', $unicode_string)); Dump $unicode_string; Dump $native_string; # ==> here it is not UTF-8 flagged $code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); $native_string=pack('W*', unpack('U*', $unicode_string)); $native_string2=Encode::encode('utf8', $unicode_string); Dump $unicode_string; Dump $native_string; # ==> this is UTF8 flaged may be transparen +tly upgraded because code point > 255 Dump $native_string2;
#Example 2: it is not bytes, it is array of code point. #Code: @bytes=unpack("C*", $unicode_string); use strict; use warnings; use Encode qw(encode); use 5.012; my($code_point,$unicode_string,@bytes); $code_point=0x41;#A $unicode_string=pack('U*', $code_point); @bytes=unpack("C*", $unicode_string); print join('|', @bytes), "\n"; $code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); @bytes=unpack("C*", $unicode_string); print join('|', @bytes), "\n"; #==>these are not bytes ,but array + of codepoints $code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U*', $code_point); @bytes=map{ sprintf("%X",$_) } unpack("C*", Encode::encode('utf8', +$unicode_string)); print join('|', @bytes), "\n";
So, I want to hear from monks suggestions, comments or "read this document", anything. I am now reading perlunicode.
regards.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Example of perluniintro
by Anonymous Monk on Aug 18, 2012 at 03:39 UTC | |
by remiah (Hermit) on Aug 18, 2012 at 04:14 UTC | |
by Anonymous Monk on Aug 18, 2012 at 04:30 UTC | |
by remiah (Hermit) on Aug 18, 2012 at 05:00 UTC | |
by Anonymous Monk on Aug 18, 2012 at 07:22 UTC | |
|