I don't know the answer to your question, but I'll make some guesses and maybe they'll help.

The reason that print chr(0x141); and print chr(321); print the same thing is because 321 in decimal is equivalent to 141 in hex. The statement print "\x{141}"; gives me an Illegal hex digit ignored... when I use the -w switch and simply prints {141}.

Typically, with Standard ASCII, there are no characters above 128 decimal (or 256 if using Extended ASCII). When you attempt to print chr(321);, Perl simply drops the bits that are irrelevant and prints an "A", which is 65 in ASCII (256+65=321).

Since Japanese characters (and Unicode) are represented by two bytes instead of one, perhaps what is happening is you have some weird buffering problem where an extra byte gets moved to STDOUT, thereby throwing off the bytes that follow. Typically, Perl buffers output to STDOUT so you get things printed only after enough "stuff" is in STDOUT. This is a performance improvement, but can cause problems if things are being written to STDOUT slowly or unusually. Try undefining $|, which causes an autoflush on output, and seeing what you get. Maybe you'll autoflush an errant byte?

Cheers,
Ovid


In reply to (Ovid) Re: Unicode on Win2k by Ovid
in thread Unicode on Win2k by Mike McClellan

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.