in reply to Re: Example of perluniintro
in thread Example of perluniintro
I am looking for confirmation. Whether the author of perluniintro forgets to encode characters to bytes , or I am missing something. What do you think?
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^3: Example of perluniintro
by Anonymous Monk on Aug 18, 2012 at 04:30 UTC | |
Whether the author of perluniintro forgets to encode characters to bytes , or I am missing something. What do you think? I don't think the author forgets something, but I'm not sure what you think the author forgets Consider these three lines of output, do you see something wrong with them?
| [reply] [d/l] |
by remiah (Hermit) on Aug 18, 2012 at 05:00 UTC | |
C is An unsigned char (octet,8bit) value.
So, why "C" values could become greater than 255? Do you mean my example should use "W" for unpack? If so, Does this make sense? The result is same with my machine. My point is, @bytes is not bytes, it is decimal code points for "HIRAGANA LETTER A". I really should read packtut. I am waiting for your replay.
update:
" pack("C") and unpack("C") are methods for emulating byte-oriented chr() and ord() on Unicode strings. While these methods reveal the internal encoding of Unicode strings, that is not something one normally needs to care about at all."
so, I think
doesn't it ? | [reply] [d/l] [select] |
by Anonymous Monk on Aug 18, 2012 at 07:22 UTC | |
So, why "C" values could become greater than 255? this seems strange... Its all strange to me, I'm not joking From http://perldoc.perl.org/5.14.1/functions/pack.html
So trying that I get
So, yes, I think I agree, its a mistake , in that it should probably say You can find the bytes that make up a UTF-8 sequence with: And this seems to confirm that
update: It says in another part of perluniintro One way of peeking inside the internal encoding of Unicode characters is to use unpack("C*", ... to get the bytes of whatever the string encoding happens to be, or unpack("U0..", ...) to get the bytes of the UTF-8 encoding: So yeah, whatever perl's actual internal format that we shouldn't care about is, it is not utf8, and if you want the UTF8 bytes, you need U0C*, otherwise (it looks like) you get IV bytes | [reply] [d/l] [select] |
by remiah (Hermit) on Aug 18, 2012 at 08:09 UTC | |