in reply to Size and anatomy of an HTTP response
Some of the confusion may be due to history. Prior to Perl 5.8 strings were simply bytes so length could only return the bytes. Support for character encodings was introduced in 5.8 (so says this: Encode - I'm not at all an encoding guru, but your question got me curious).
From what I understand of that document, if the string is marked as utf8 (a bit set in the C guts of Perl), it's length will be counted as characters because it knows to check if each byte is a complete or partial character. Otherwise it's length is counted as bytes. You can see the flag value using _is_utf8. It is normally set automatically to your input stream's encoding when you read in characters, but if you aren't sure about the history of the string you can use that function to check its status. For more information, see the section on messing with Perl's internals in Encode.
There are also methods for explicitly selecting whether your string will be read as bytes or utf8 octets and for chosing the rules for converting back and forth from raw bytes to utf8 - see the same document for encode, decode and from_to.
Update: added more information about controlling the utf8 status.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Size and anatomy of an HTTP response
by afoken (Chancellor) on Dec 15, 2010 at 13:52 UTC | |
by ELISHEVA (Prior) on Dec 15, 2010 at 14:49 UTC | |
by Discipulus (Canon) on Dec 16, 2010 at 09:47 UTC | |
by afoken (Chancellor) on Dec 16, 2010 at 14:06 UTC | |
by Anonymous Monk on Dec 16, 2010 at 10:13 UTC | |
by ELISHEVA (Prior) on Dec 16, 2010 at 10:34 UTC | |
by Discipulus (Canon) on Dec 16, 2010 at 12:49 UTC |