peterp has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

edit: apologies but the utf8 characters are not displaying properly in code. Each string is ϗblah頁.

I am having an issue in understanding what is happening between the two code snippets below. The first example generates a wide character error as expected since I haven't set STDOUT's encoding to utf8. However, in the second example, although I have pointed STDERR at STDOUT, no such error is generated on warn and it successfully prints out the utf8 characters. What exactly is happening here?

example 1

use strict; use warnings FATAL => qw#all#; use utf8; #print qq#content-type: text/plain; charset=UTF-8\n\n#; print q#print: ϗblah頁#; # wide char error.

example 2

use strict; use warnings FATAL => qw#all#; use utf8; *STDERR = *STDOUT; #print qq#content-type: text/plain; charset=UTF-8\n\n#; warn q#warn: ϗblah頁#; # no error. print q#print: ϗblah頁#; # wide char error still.

Thanks,

Peter

Replies are listed 'Best First'.
Re: Unexpectedly no wide char error when stderr points at stdout
by hippo (Archbishop) on Feb 14, 2015 at 10:16 UTC

    Your test script perfectly demonstrates that there is nothing different in this case about the actual STDOUT and STDERR filehandles. The difference lies in the functions used to send them the output. print just prints, but warn does a lot more besides: appends line numbers, substitutes a default phrase if none given and can be overridden by SIG{__WARN__}

    We can guess therefore that it is something else which warn is doing behind the scenes to set its encoding to utf-8 and thus avoiding the warning. This is the same conclusion reached in this blog post and seems the likeliest reason. perlunifaq also contains this little gem:

    It's good that you lost track, because you shouldn't depend on the internal format being any specific encoding. But since you asked: by default, the internal format is either ISO-8859-1 (latin-1), or utf8, depending on the history of the string. On EBCDIC platforms, this may be different even.

    We can conclusively show that warn does something to the string by overriding it. eg:

    use strict; use utf8; warn q#warn: ϗblah頁#; $SIG{__WARN__} = sub { print @_ }; warn q#warn: ϗblah頁#;

    (As in the OP those should be the actual unicode chars) Here, the second warn generates the wide character error whereas the first does not. To know more about what the default warn handler is doing you would have to start digging in the perl internals.

    For other readers: this is all the case under v5.10.1 as reported by peterp, other versions may vary.

      Hippo,

      Firstly thank you very much for your detailed response. You have reassured me that its not a system level issue out of my control as I was afraid of, and I have gone ahead and used the open pragma to binmode all standard streams to utf8. The blog you linked to was particularly useful and serves to confirm what you stated.

      Peter

Re: Unexpectedly no wide char error when stderr points at stdout
by choroba (Cardinal) on Feb 14, 2015 at 00:53 UTC
    What Perl version and OS? Behaves the same (i.e. throws the error) in both cases for me on 5.18.1 / Linux.
    لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ

      I am using Perl '5.10.1' and OS 'Debian GNU/Linux 6.0'. I should also mention I am running this on a 1and1 shared host.

      Peter