Unexpectedly no wide char error when stderr points at stdout

peterp has asked for the wisdom of the Perl Monks concerning the following question:

Hi,

edit: apologies but the utf8 characters are not displaying properly in code. Each string is ϗblah頁.

I am having an issue in understanding what is happening between the two code snippets below. The first example generates a wide character error as expected since I haven't set STDOUT's encoding to utf8. However, in the second example, although I have pointed STDERR at STDOUT, no such error is generated on warn and it successfully prints out the utf8 characters. What exactly is happening here?

example 1

use strict;
use warnings FATAL => qw#all#;
use utf8;

#print qq#content-type: text/plain; charset=UTF-8\n\n#;

print q#print: &#983;blah&#38913;#; # wide char error.
[download]

example 2

use strict;
use warnings FATAL => qw#all#;
use utf8;

*STDERR = *STDOUT;

#print qq#content-type: text/plain; charset=UTF-8\n\n#;

warn q#warn: &#983;blah&#38913;#; # no error.
print q#print: &#983;blah&#38913;#; # wide char error still.
[download]

Thanks,

Peter

Comment on Unexpectedly no wide char error when stderr points at stdout Select or Download Code

Replies are listed 'Best First'.
Re: Unexpectedly no wide char error when stderr points at stdout by hippo (Archbishop) on Feb 14, 2015 at 10:16 UTC
Your test script perfectly demonstrates that there is nothing different in this case about the actual STDOUT and STDERR filehandles. The difference lies in the functions used to send them the output. `print` just prints, but `warn` does a lot more besides: appends line numbers, substitutes a default phrase if none given and can be overridden by `SIG{__WARN__}` We can guess therefore that it is something else which `warn` is doing behind the scenes to set its encoding to utf-8 and thus avoiding the warning. This is the same conclusion reached in this blog post and seems the likeliest reason. perlunifaq also contains this little gem: It's good that you lost track, because you shouldn't depend on the internal format being any specific encoding. But since you asked: by default, the internal format is either ISO-8859-1 (latin-1), or utf8, depending on the history of the string. On EBCDIC platforms, this may be different even. We can conclusively show that `warn` does something to the string by overriding it. eg: `use strict; use utf8; warn q#warn: ϗblah頁#; $SIG{__WARN__} = sub { print @_ }; warn q#warn: ϗblah頁#;` [download] (As in the OP those should be the actual unicode chars) Here, the second `warn` generates the wide character error whereas the first does not. To know more about what the default warn handler is doing you would have to start digging in the perl internals. For other readers: this is all the case under v5.10.1 as reported by peterp, other versions may vary.	[reply] [d/l] [select]
Re^2: Unexpectedly no wide char error when stderr points at stdout by peterp (Sexton) on Feb 14, 2015 at 14:08 UTC
Hippo, Firstly thank you very much for your detailed response. You have reassured me that its not a system level issue out of my control as I was afraid of, and I have gone ahead and used the open pragma to binmode all standard streams to utf8. The blog you linked to was particularly useful and serves to confirm what you stated. Peter	[reply]
Re: Unexpectedly no wide char error when stderr points at stdout by choroba (Cardinal) on Feb 14, 2015 at 00:53 UTC
What Perl version and OS? Behaves the same (i.e. throws the error) in both cases for me on 5.18.1 / Linux. لսႽ† ᥲᥒ⚪⟊Ⴙᘓᖇ Ꮅᘓᖇ⎱ Ⴙᥲ𝇋ƙᘓᖇ	[reply]
Re^2: Unexpectedly no wide char error when stderr points at stdout by peterp (Sexton) on Feb 14, 2015 at 01:01 UTC
I am using Perl '5.10.1' and OS 'Debian GNU/Linux 6.0'. I should also mention I am running this on a 1and1 shared host. Peter	[reply]