Some of the confusion may be due to history. Prior to Perl 5.8 strings were simply bytes so length could only return the bytes. Support for character encodings was introduced in 5.8 (so says this: Encode - I'm not at all an encoding guru, but your question got me curious).

From what I understand of that document, if the string is marked as utf8 (a bit set in the C guts of Perl), it's length will be counted as characters because it knows to check if each byte is a complete or partial character. Otherwise it's length is counted as bytes. You can see the flag value using _is_utf8. It is normally set automatically to your input stream's encoding when you read in characters, but if you aren't sure about the history of the string you can use that function to check its status. For more information, see the section on messing with Perl's internals in Encode.

There are also methods for explicitly selecting whether your string will be read as bytes or utf8 octets and for chosing the rules for converting back and forth from raw bytes to utf8 - see the same document for encode, decode and from_to.

Update: added more information about controlling the utf8 status.


In reply to Re: Size and anatomy of an HTTP response by ELISHEVA
in thread Size and anatomy of an HTTP response by Discipulus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.