This is clearly not a Perl problem (or at least I don't think so), but I can't think of a better place to get an understanding of what is happening.

If I run a bash shell using xterm -u8 (-u8 turns on utf8 mode for xterm), the following line in a Perl script appears to make a script die without even executing the end blocks:

#print first letter of Hebrew alphabet (aleph) my $ch=chr(0x5D0); print STDERR "$ch\n";

This is just an appearance. In reality the line doesn't kill the script at all. I was able to run the same script in an xemacs command shell, and it is clear that the script is running to completion. (see below for test code and output). I can also avoid sudden death by starting up xterm in wide character mode (xterm -u8 -wc).

I'd like to understand why U0D50 causes sudden death when wide character mode is off. Here in Israel, the first letter of the Hebrew alphabet (aleph) isn't exactly an exotic character. The more serious problem is that any test script I have also goes silent and appears to die if it prints out a diagnostic that contains that character unless it is running in a specially configured terminal. Not good.

Other utf8 characters sometimes display two characters where I expect 1, or display the wrong glyph (or a placeholder box). I could understand ugly output, but what is special about U05D that would make a terminal think it should stop displaying output sent to STDOUT and STDERR?

Also if there are any Israeli monks out there (or Hebrew speaking monks from other parts of the world) reading this who are familiar with this issue and have a work around they use, please speak up!

Platform details:

Debian (Lenny) system perl (5.10.0) xterm version: XTerm(235) bash: GNU bash, version 3.2.39(1)-release (i486-pc-linux-gnu)

Test script:

use strict; use warnings; use PerlIO; use Devel::Peek; my $ch=chr(0x5D0); Devel::Peek::Dump($ch); binmode(STDERR); print STDERR "layers for STDERR: @{[PerlIO::get_layers(STDERR)]}\n"; print STDERR "$ch\n"; #complains about wide character binmode(STDERR, ":utf8"); print STDERR "layers for STDERR: @{[PerlIO::get_layers(STDERR)]}\n"; print STDERR "$ch\n"; # no complaints here print STDERR "I survived :-) !!!\n"; print STDOUT "I really did. I really did.\n"; # End blocks to help verify that STDERR output is being # truncated, and script is not merely aborting END { warn "Ah...dead\n"; } END { warn "I'm dying :-( \n" }

Output in Xemacs shell:

SV = PV(0x817c6d0) at 0x8197e90 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x819e970 "\327\220"\0 [UTF8 "\x{5d0}"] CUR = 2 LEN = 4 layers for STDERR: unix perlio Wide character in print at Monks/Foo.pm line 916. \220א layers for STDERR: unix perlio utf8 \220א I survived :-) !!! I really did. I really did. I'm dying :-( Ah...dead

Output on xterm -u8 -wc (widechar on) - output is the same as xemacs except that U05D0 prints as "" not "\220"

SV = PV(0x817c6d0) at 0x8197e90 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x819e970 "\327\220"\0 [UTF8 "\x{5d0}"] CUR = 2 LEN = 4 layers for STDERR: unix perlio Wide character in print at Monks/Foo.pm line 916. layers for STDERR: unix perlio utf8 I survived :-) !!! I really did. I really did. I'm dying :-( Ah...dead

Output on xterm -u8 (widechar off). Notice how everything after the wide character warning all output to STDOUT and STDERR disappear as if U05D) causes STDOUT and STDERR to close. Note that it does not hang. The script just terminates with no further visible output and a prompt for a new command appears.

$ perl myscript.pl SV = PV(0x817c6d0) at 0x8197e90 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x81a2560 "\327\220"\0 [UTF8 "\x{5d0}"] CUR = 2 LEN = 4 layers for STDERR: unix perlio Wide character in print at Monks/Foo.pm line 916. $

Note: switching the order of output so that output to the STDOUT w/ a utf8 layer comes first does not improve the situation. Instead of dying after the warning, it dies silently on the print statement.

$ perl myscript.pl SV = PV(0x817c6d0) at 0x8197e90 REFCNT = 1 FLAGS = (PADMY,POK,pPOK,UTF8) PV = 0x819f610 "\327\220"\0 [UTF8 "\x{5d0}"] CUR = 2 LEN = 4 layers for STDERR: unix perlio utf8 $

Update: clarified that the script terminates with no further output and does not hang.


In reply to Printing the first letter of the Hebrew alphabet (U05D0) kills script? by ELISHEVA

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.