in reply to Standard handles inherited from a utf-8 enabled shell

No. The "utf-8 enabled" is a property of the terminal, not of the shell. Perl isn't aware of it, so if you do something like

perl -E 'say chr(255)'|hexdump -C 00000000 ff 0a |..|

the output encoding is Latin-1 (even if the locale is something with UTF-8).

Note that this changes for characters with codepoint > 255. Those can't be encoded in Latin-1, so UTF-8 is used for the whole string (and you get a "wide character" warning).

Replies are listed 'Best First'.
Re^2: Standard handles inherited from a utf-8 enabled shell
by BrowserUk (Patriarch) on Mar 21, 2012 at 18:05 UTC
    No. The "utf-8 enabled" is a property of the terminal, not of the shell.

    Hm. When I used the term "shell", I (perhaps) wasn't specific enough.

    Please see subthread Re^3: Help with pack error, and then consider how:

    a byte value greater than 127, output from a perl script that makes no attempt to enable utf-anything, that is piped directly to a a process (od) that makes no attempt to perform any conversions or transformations of its input, would see that byte as 2 bytes?

    Ie. Where is the utf-8'ness being applied?


    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

    The start of some sanity?

      It shouldn't happen, and i don't know any Perl version where it happens.

      But the example there isn't minimal at all (why load LWP::something? that could do some dire magic) and not runnable for me (what does it read from STDIN?), so it's hard to tell.

      And I'm not sure what happens on windows when you try to write binary data to a text file handle. (Linux doesn't have that distinction, and I don't use windows for programming, so I don't know what the expected outcome is. I remember one unhappy foray into windows programming where I spent several hours debugging a missing "b" in a call to open when reading files).

        the example there isn't minimal at all

        Agreed. I've also no faith in the fidelity of the OPs descriptions.

        I'm not sure what happens on windows

        It is not at all clear to me that the OP of that thread is using Windows?

        I saw a reference somewhere in the Perl documentation saying that with the advent of Unicode support, it has become important to use binmode appropriately even on non-dosish systems. I cannot find that right now, but I do see this:

        "For the sake of portability it is a good idea always to use it when appropriate, and never to use it when it isn't appropriate. Also, people can set their I/O to be by default UTF8-encoded Unicode, not bytes."

        My point is that this isn't a "windows (only) problem".

        I speculate that if you print bytes with the high-bit set, from a no-utf-enabled instance of perl, run from a utf-enabled shell, this situation can arise. Regardless of the OS you happen to be running on.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

        The start of some sanity?