in reply to Re: Character Encoding and Windows Console woes
in thread Character Encoding and Windows Console woes

perl -C -Mutf8 -e"print qq(\x{83})" >d.txt
The output file only contains one byte, the 0x83. In UTF-8 it should have been 2 bytes. Printing to the console (not redirecting output), it showed one character in the OEM character set.

The docs I have say that -C enables wide system calls (See ${^WIDE_SYSTEM_CALLS} in the perlvar manpage.)

Replies are listed 'Best First'.
Re: Re: Re: Character Encoding and Windows Console woes
by BrowserUk (Patriarch) on Feb 16, 2004 at 22:05 UTC

    This changed in 5.8.1 (from perlrun)

    -C <number/list>

    The -C flag controls some Unicode of the Perl Unicode features. <<Their typo not mine>>

    ~~snip~~

    (In Perls earlier than 5.8.1 the -C switch was a Win32-only switch that enabled the use of Unicode-aware ``wide system call'' Win32 APIs. This feature was practically unused, however, and the command line switch was therefore ``recycled''.)


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    Timing (and a little luck) are everything!
      Thanks for pointing that out. The html docs on one machine must not have been updated properly, since it's Perl 5.8.2. (I noticed that Win32::API is missing from the index pane, even though the module is present, so this is the second anomoly today). On another machine, I see a different -C documentation.

Re: Re: Re: Character Encoding and Windows Console woes
by Anonymous Monk on Feb 17, 2004 at 03:06 UTC
    In perlunicode:
    Unicode characters can also be added to a string by using the \x{...} notation. The Unicode code for the desired character, in hexadecimal, should be placed in the braces. For instance, a smiley face is \x{263A}. This encoding scheme only works for characters with a code of 0x100 or above.
    Something for backward compatiblity, I think.
      If at least one character in the string has a code of >= 0x100, then all characters >0x7F will be multi-byte encoded. If all the characters are less than 256, then it is also possible to encode the string with one byte per character. Some functions, like chr, make it a point to use the byte form when possible, since it's faster.