Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I have PERL_UNICODE set to SDL, ${^UNICODE} reports back 95.(¹)

This is really comfy for my UTF8 locale, as filehandles and STDFOO are already marked as :utf8 or something(²) and the utf8-related warnings disappear and that allows me to be more lazy while programming, especially for oneliners.

But I cannot get rid of this one which happens whenever I pass a descriptive string to Test::More's ok() with characters whose ord() > 255.(³)

$ perl -e'use Test::More tests => 1; ok 1, "\x{2639}";' 1..1 Wide character in print at /usr/lib/perl5/5.8.8/Test/Builder.pm line 1 +172. ok 1 - ☹

I tried to track down the cause of that(⁴) and suspect it's the way filehandles are set up at Test/Builder.pm ll. 1317. I managed to reproduce a minimal test case in the debugger.

$ perl -d -e0 Loading DB routines from perl5db.pl version 1.28 Editor support available. Enter h or `h h' for help, or `man perldebug' for more help. main::(-e:1): 0 + DB<1> open(TESTOUT, ">&STDOUT") or die "Can' +t dup STDOUT: $!"; + DB<2> $fh = \*TESTOUT; + DB<3> print $fh "\x{2639}"; Wide character in print at (eval 23)[/usr/lib/perl5/5.8.8/perl5db.pl:6 +28] line 2. at (eval 23)[/usr/lib/perl5/5.8.8/perl5db.pl:628] line 2 eval '($@, $!, $^E, $,, $/, $\\, $^W) = @saved;package main; $ +^D = $^D | $DB::db_stop; print $fh "\\x{2639}";; ;' called at /usr/lib/perl5/5.8.8/perl5db.pl line 628 DB::eval called at /usr/lib/perl5/5.8.8/perl5db.pl line 3412 DB::DB called at -e line 1 + DB<4> &#9785;

However, if I put these three lines into perl -e or a file and run it, I do not get the warning at all. Strange, huh?

(¹) See perlrun and perlvar.

(²) How do I know what modes lie on a filehandle, anyway? PerlIO::get_layers always just returns qw(unix perlio) which is too little to be possibly true.

(³) Actually I use utf8; and have literal characters in that string instead of \x escaped, but it doesn't matter.

(⁴) And in the cause of that found an orkaround for the problem: binmode Test::More->builder->output, ':utf8';

Replies are listed 'Best First'.
Re: bug? duped filehandles and unicode
by Joost (Canon) on May 12, 2007 at 20:03 UTC
    update: I didn't spot your PERL_UNICODE remark. Ignore the two paragraphs below

    If you have a recent perl (5.8.5ish 5.8.1 or higher) I'm pretty sure your original file handles are NOT marked as utf-8, since there is no way they will be set to utf-8 automatically.

    Observe: (this is perl 5.8.8 with $LANG=en_US.UTF-8 on an xterm in utf8 mode):

    $ perl -e'print "\x{2639}"' Wide character in print at -e line 1. &#9785;

    But yeah, it seems strange that duplicating a filehandle doesn't copy its encoding. On the other hand, if you open() a file without explicitly setting an encoding it defaults to the systems default 8-bit encoding (see perluniintro and perlopentut), so there's something to be said for the current behaviour.

    Also, PerlIO::get_layers is (probably) correct. You don't see the utf8 layer because it's not set (see above).

    update2: about that get_layers issue: it seems to work correctly for me (taking into account that layers won't be copied):

    PERL_UNICODE=SDL perl -e'print PerlIO::get_layers(STDOUT)' unixperlioutf8
Re: bug? duped filehandles and unicode
by Khen1950fx (Canon) on May 12, 2007 at 23:24 UTC
    Here's an interesting discussion related to your question.
Re: bug? duped filehandles and unicode
by Burak (Chaplain) on May 14, 2007 at 10:14 UTC
    I've opened a ticket for this problem in RT, but no response yet. You can see my post in the Test::Simple queue (but RT seems to be dead at the time I'm writing this). The actual code I'm using is this:
    eval q{ binmode Test::More->builder->output, ':utf8'; } if $] >= 5.008 +;