You mentioned (emphasis mine):


I agree with this, but I believe we have different assumptions on what is meant by interpretation. Look, I need a way to refer to that number, because that is fundamental. I call that number a "character". The value of that number is what I call the "codepoint value". Bear with me: forget "Unicode" for now, and grant me the use of those words. At any time, you may s/character|codepoint/_that_number_/gi.

Before that sentence, you mentioned:


Well, that number is 255 == ord(pack 'B8', '11111111'). Saying it's a (single) byte means you've established the number of bits for it is 8. That, to me, is giving the number an interpretation(*). This observation is very important when it comes to the subject of encoding, especially when we're to print that character (i.e. that number).

If you want to print a string, you should avoid any preconceived notion of how many bits the string "has" prior to deciding which encoding to use. I find thinking in terms of characters (i.e. those numbers) and what their codepoint values (i.e. the number values) are, helps tremendously in my handling of strings up to the point where they are encoded using print. That is my thought process, and the message I was trying to deliver.

(*) I am aware of the details of how perl stores that number in memory, but not as well versed as you. I would like to reiterate that this discussion is about print and encoding, and that the ordinal of the character is what matters here.

Agreed.


And that's the thing: the concept of encoding alone does not make sense without the concept of characters (what we're encoding). And those characters can only exist within the process (e.g. numbers in Perl's "string"). Our computer "systems" (e.g. web browser, text editor, terminal, program, etc.) do this decode-incoming-octets-then-output-octets-already-encoded dance between each other to handoff characters.

When Perl warns you about "Wide character in print", what it's really saying is: Please be explicit about the encoding so that I can tell the next "system" about my characters accurately, using only octets.


Agreed.

In reply to Re^9: Standard handles inherited from a utf-8 enabled shell by repellent
in thread Standard handles inherited from a utf-8 enabled shell by BrowserUk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.