You mentioned (emphasis mine):
No interpretation of the meaning (nor even signedness) is placed (nor could be) upon that number until you do something with it!
I agree with this, but I believe we have different assumptions on what is meant by
interpretation. Look, I need a way to refer to
that number, because that is fundamental. I call that
number a "
character". The value of that
number is what I call the "
codepoint value". Bear with me: forget "Unicode" for now, and grant me the use of those words. At any time, you may
s/character|codepoint/_that_number_/gi.
Before that sentence, you mentioned:
It is a byte! An 8-bit bit pattern stored in a 8-bit unit of memory and nothing else.
Well, that
number is
255 == ord(pack 'B8', '11111111'). Saying it's a (single) byte means you've established the number of bits for it is
8. That, to me, is giving the number an
interpretation(*). This observation is very important when it comes to the subject of
encoding, especially when we're to print that
character (i.e. that
number).
If you want to
print a string, you should avoid any preconceived notion of how many bits the string "has" prior to deciding which encoding to use. I find thinking in terms of
characters (i.e. those
numbers) and what their
codepoint values (i.e. the
number values) are, helps tremendously in my handling of strings up to the point where they are encoded using
print. That is my thought process, and the message I was trying to
deliver.
(*) I am aware of the details of how perl stores that
number in memory, but not as well versed as you. I would like to reiterate that this discussion is about
print and encoding, and that the
ordinal of the character is what matters here.
The important part is that the OS cannot preserve what it has no knowledge of.
Agreed.
There is no concept of encoding attached to the file descriptors.
And that's the thing: the concept of encoding
alone does not make sense without the concept of
characters (what we're encoding). And those
characters can only exist within the process (e.g.
numbers in Perl's "string"). Our computer "systems" (e.g. web browser, text editor, terminal, program, etc.) do this decode-incoming-octets-then-output-octets-already-encoded dance between each other to handoff
characters.
When Perl warns you about "Wide character in print", what it's really saying is: Please be explicit about the encoding so that I can tell the next "system" about
my characters accurately, using only octets.
The bottom line -- for this thread, rather than this subthread -- is that the OP must have omitted some details from his scenario.
Agreed.