Re: Example of perluniintro

Replies are listed 'Best First'.
Re^2: Example of perluniintro by remiah (Hermit) on Aug 18, 2012 at 04:14 UTC
Thank you for replay. I am looking for confirmation. Whether the author of perluniintro forgets to encode characters to bytes , or I am missing something. What do you think?	[reply]
Re^3: Example of perluniintro by Anonymous Monk on Aug 18, 2012 at 04:30 UTC
Whether the author of perluniintro forgets to encode characters to bytes , or I am missing something. What do you think? I don't think the author forgets something, but I'm not sure what you think the author forgets Consider these three lines of output, do you see something wrong with them? `#!/usr/bin/perl -- use strict; use warnings; use Data::Dump; my $code_point = 0x3042;# HIRAGANA LETTER A aka 12354 my $unicode_string = pack('U', $code_point); dd 12354 => pack('U', 12354); dd "UNSIGNED CHARS(W) ", pack "W", unpack "U", $unicode_string.$un +icode_string; dd "UNSIGNED OCTETS(C) ", unpack "C", $unicode_string.$unicode_strin +g; __END__ (12354, "\x{3042}") ("UNSIGNED CHARS(W) ", "\x{3042}\x{3042}") ("UNSIGNED OCTETS(C*) ", 12354, 12354)` [download]	[reply] [d/l]
Re^4: Example of perluniintro by remiah (Hermit) on Aug 18, 2012 at 05:00 UTC
I saw the output... C is An unsigned char (octet,8bit) value. W An unsigned char value (can be greater than 255). So, why "C" values could become greater than 255? #unpack "C", $unicode_string.$unicode_string; #("UNSIGNED OCTETS(C) ", 12354, 12354) this seems strange... Do you mean my example should use "W" for unpack? If so, Does this make sense? The result is same with my machine. My point is, @bytes is not bytes, it is decimal code points for "HIRAGANA LETTER A". `$code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U', $code_point); @bytes=unpack("W", $unicode_string); print join('\|', @bytes), "\n"; #==>these are not bytes ,but array + of codepoints $code_point=0x3042;#HIRAGANA LETTER A $unicode_string=pack('U', $code_point); @bytes=map{ sprintf("%X",$_) } unpack("W", Encode::encode('utf8', +$unicode_string)); print join('\|', @bytes), "\n";` [download] I really should read packtut. I am waiting for your replay. update: I met description of perlunicode: " pack("C") and unpack("C") are methods for emulating byte-oriented chr() and ord() on Unicode strings. While these methods reveal the internal encoding of Unicode strings, that is not something one normally needs to care about at all." so, I think `# this is wrong @bytes=unpack("C", $unicode_string); # this is right @byets= unpack("C", Encode::encode('utf8',$unicode_string));` [download] doesn't it ?	[reply] [d/l] [select]
Re^5: Example of perluniintro by Anonymous Monk on Aug 18, 2012 at 07:22 UTC
Re^6: Example of perluniintro by remiah (Hermit) on Aug 18, 2012 at 08:09 UTC