in reply to Re^3: Standard handles inherited from a utf-8 enabled shell
in thread Standard handles inherited from a utf-8 enabled shell

the example there isn't minimal at all

Agreed. I've also no faith in the fidelity of the OPs descriptions.

I'm not sure what happens on windows

It is not at all clear to me that the OP of that thread is using Windows?

I saw a reference somewhere in the Perl documentation saying that with the advent of Unicode support, it has become important to use binmode appropriately even on non-dosish systems. I cannot find that right now, but I do see this:

"For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes."

My point is that this isn't a "windows (only) problem".

I speculate that if you print bytes with the high-bit set, from a no-utf-enabled instance of perl, run from a utf-enabled shell, this situation can arise. Regardless of the OS you happen to be running on.


With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.

The start of some sanity?

  • Comment on Re^4: Standard handles inherited from a utf-8 enabled shell

Replies are listed 'Best First'.
Re^5: Standard handles inherited from a utf-8 enabled shell
by moritz (Cardinal) on Mar 21, 2012 at 19:02 UTC
    I speculate that if you print bytes with the high-bit set, from a no-utf-enabled instance of perl, run from a utf-enabled shell, this situation can arise. Regardless of the OS you happen to be running on.

    I won't believe this until I've seen it, reproduced as a minimal example (disregarding things like shell alises that add command line options, PERL5OPT, PERLIO or PERL_UNICODE environment variables).

      I won't believe this until I've seen it, reproduced as a minimal example

      As I said: agreed. But can you think of anything else that might fit with the symptoms described and the apparent solution?

      I couldn't, and all my attempts to try and re-create the situation also failed:

      perl -CO -e" system q[ \perl64\bin\perl.exe -e\" print pack 'B8', '111 +11111'; \" | od -t x1 ]" 0000000 ff 0000001 perl -CO -e" system q[\perl64\bin\perl.exe -CO -e\"print pack 'B8', '1 +1111111'; \" | od -t x1 ]" 0000000 c3 bf 0000002

      I would have expected the first of those to produce the same od output as the second, had the second instance of perl inherited the stdout characteristics of its parent.

      But I'm on windows, and disproving the possibility here, doesn't disprove it for other platforms, hence my asking the question.


      With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

      The start of some sanity?

        That's not how I see it. I see the system-ed perl as an autonomous process (unknowing of its parent process) with its STDOUT filehandle set with different encodings.

        In both cases, we're printing out a string with one character at codepoint U+00FF.

        The second system-ed perl has its output encoding set to UTF-8 (via -CO). What octets do we send out into the cruel world for U+00FF character encoded in UTF-8? Ans: c3 bf.

        The first system-ed perl has its output "set" to byte/Latin-1 encoding (the default). What octets do we send out into the cruel world for U+00FF character encoded in Latin-1? Ans: ff.

        The first case did not print c3 bf just because of the parent perl -CO because the system print did not go through the parent's perlio.