Something to keep in mind is that the fact that a term can mean two different things doesn't necessarily mean it's a confusing term. If only one of each definition is used in a given context, then there's no confusion. For example, it would be perfectly fine to say that pack( "N", 200 ) returns a string of 4 bytes. Doesn't matter how many bytes of storage is used for it. It's still the same string of 4 bytes if you upgrade the resulting scalar. Because we're not talking about how it's stored. We're talking about how it would be stored in a stream. So yeah, the string of 4 bytes might use 5 bytes of storage in the string buffer. And the same the string of 4 bytes takes up 42 bytes of memory (according to Devel::Size). Yet none of that is confusing.


a character is encoded by 1-4 characters ... WTF? )

Would you say "I put the vehicle on the vehicle", or would you say "I drove my car onto the ferry"?

The fact that the process involves string operations and thus characters isn't relevant. You always have the option of being more specific (e.g. using Code Point of character, and byte instead of character) if it makes things clearer.

That's why one would say a Code Point encodes to one to four bytes.

with it's one to four bytes.

Each character takes up 1 to 13 bytes of storage, actually.

As you pointed out are most (not all) string operators in Perl "character" based.

I did not sat that. Quite the opposite. All string operations deal with characters. By definition. A string is made up of characters. I literally said that's the name of the elements of a string.

If a terminology invites for misunderstandings one should chose a new word.

I welcome an unambiguous word for "string element". But until one comes around, I shall continue defining the terminology I use, and part of that is defining a character to be a string element. Cause no one wants me to say string elements.

But did you notice that the terminology I said I use didn't mention "character" at all? I think you're trying to convince me that "character" is confusing, yet I didn't even use the word! It's a term that can usually be avoided entirely. It only comes into play when dealing with strings of arbitrary content. There's usually no reason for a term for string elements otherwise.


In reply to Re^16: Seeking Perl docs about how UTF8 flag propagates (Terminology) by ikegami
in thread Seeking Perl docs about how UTF8 flag propagates by raygun

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.