Re^9: Standard handles inherited from a utf-8 enabled shell

You mentioned (emphasis mine):

interpretation

that number

I agree with this, but I believe we have different assumptions on what is meant by interpretation. Look, I need a way to refer to that number, because that is fundamental. I call that number a "character". The value of that number is what I call the "codepoint value". Bear with me: forget "Unicode" for now, and grant me the use of those words. At any time, you may s/character|codepoint/_that_number_/gi.

Before that sentence, you mentioned:

It is a byte! An 8-bit bit pattern stored in a 8-bit unit of memory and nothing else.
Well, that number is 255 == ord(pack 'B8', '11111111'). Saying it's a (single) byte means you've established the number of bits for it is 8. That, to me, is giving the number an interpretation(*). This observation is very important when it comes to the subject of encoding, especially when we're to print that character (i.e. that number).

If you want to print a string, you should avoid any preconceived notion of how many bits the string "has" prior to deciding which encoding to use. I find thinking in terms of characters (i.e. those numbers) and what their codepoint values (i.e. the number values) are, helps tremendously in my handling of strings up to the point where they are encoded using print. That is my thought process, and the message I was trying to deliver.

(*) I am aware of the details of how perl stores that number in memory, but not as well versed as you. I would like to reiterate that this discussion is about print and encoding, and that the ordinal of the character is what matters here.

The important part is that the OS cannot preserve what it has no knowledge of.
Agreed.

There is no concept of encoding attached to the file descriptors.
And that's the thing: the concept of encoding alone does not make sense without the concept of characters (what we're encoding). And those characters can only exist within the process (e.g. numbers in Perl's "string"). Our computer "systems" (e.g. web browser, text editor, terminal, program, etc.) do this decode-incoming-octets-then-output-octets-already-encoded dance between each other to handoff characters.

When Perl warns you about "Wide character in print", what it's really saying is: Please be explicit about the encoding so that I can tell the next "system" about my characters accurately, using only octets.

The bottom line -- for this thread, rather than this subthread -- is that the OP must have omitted some details from his scenario.
Agreed.

Comment on Re^9: Standard handles inherited from a utf-8 enabled shell Select or Download Code

Replies are listed 'Best First'.
Re^10: Standard handles inherited from a utf-8 enabled shell by BrowserUk (Patriarch) on Mar 22, 2012 at 11:38 UTC
Well, that number is 255. Saying it's a (single) byte means you've established the number of bits for it is 8. No. You've got that backward. At the point the value is returned from pack, it isn't even a number. It is just 8 bits. They could represent anything, including 8 physically grouped but otherwise unrelated discrete boolean values -- the current on/offness of the headlights, sidelight and tail-lights on a car; yes/no answers to a survey. Referring to (not interpreting as) that bit pattern using 255/0377/xff is just easier than 0b11111111. That, to me, is giving the number an interpretation. This observation is very important when it comes to the subject of encoding, especially when we're to print that character (i.e. that number). Sorry, but you are assuming that the 8-bits represents something to do with "strings & characters and codepoints". It could just as well be 1 byte of a 4 or 8 byte memory address; or part of an IP address; or a sound level ... The point of my asking the question was trying to make sense of the OP's (of the other thread), description. I knew that I couldn't replicate his apparent scenario on my system, but I am not familiar with the working of Unicode on *nix. It was conceivable to me that, when running on a "unicode enabled terminal", there might be some default interpretation the byte values printed to that terminal that might be inherited by processes spawned from that terminal. I am informed that there isn't! But it was vaguely conceivable that there might be. And that might have been an explanation for the OPs apparent problem. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. The start of some sanity?	[reply]

Replies are listed 'Best First'.

Re^10: Standard handles inherited from a utf-8 enabled shell
by BrowserUk (Patriarch) on Mar 22, 2012 at 11:38 UTC

Well, that number is 255. Saying it's a (single) byte means you've established the number of bits for it is 8.

No. You've got that backward. At the point the value is returned from pack, it isn't even a number. It is just 8 bits.

They could represent anything, including 8 physically grouped but otherwise unrelated discrete boolean values -- the current on/offness of the headlights, sidelight and tail-lights on a car; yes/no answers to a survey.

Referring to (not interpreting as) that bit pattern using 255/0377/xff is just easier than 0b11111111.

That, to me, is giving the number an interpretation. This observation is very important when it comes to the subject of encoding, especially when we're to print that character (i.e. that number).

Sorry, but you are assuming that the 8-bits represents something to do with "strings & characters and codepoints". It could just as well be 1 byte of a 4 or 8 byte memory address; or part of an IP address; or a sound level ...

The point of my asking the question was trying to make sense of the OP's (of the other thread), description. I knew that I couldn't replicate his apparent scenario on my system, but I am not familiar with the working of Unicode on *nix.

It was conceivable to me that, when running on a "unicode enabled terminal", there might be some default interpretation the byte values printed to that terminal that might be inherited by processes spawned from that terminal.

I am informed that there isn't!

But it was vaguely conceivable that there might be. And that might have been an explanation for the OPs apparent problem.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

[reply]