Re: Re: Character Encoding and Windows Console woes

perl -C -Mutf8 -e"print qq(\x{83})" >d.txt
[download]

The output file only contains one byte, the 0x83. In UTF-8 it should have been 2 bytes. Printing to the console (not redirecting output), it showed one character in the OEM character set.

The docs I have say that -C enables wide system calls (See ${^WIDE_SYSTEM_CALLS} in the perlvar manpage.)

Comment on Re: Re: Character Encoding and Windows Console woes Download Code

Replies are listed 'Best First'.
Re: Re: Re: Character Encoding and Windows Console woes by BrowserUk (Patriarch) on Feb 16, 2004 at 22:05 UTC
This changed in 5.8.1 (from perlrun) -C <number/list> The -C flag controls some Unicode of the Perl Unicode features. <<Their typo not mine>> ~~snip~~ (In Perls earlier than 5.8.1 the -C switch was a Win32-only switch that enabled the use of Unicode-aware ``wide system call'' Win32 APIs. This feature was practically unused, however, and the command line switch was therefore ``recycled''.) Examine what is said, not who speaks. "Efficiency is intelligent laziness." -David Dunham "Think for yourself!" - Abigail Timing (and a little luck) are everything!	[reply]
Re: Re: Re: Re: Character Encoding and Windows Console woes by John M. Dlugosz (Monsignor) on Feb 17, 2004 at 05:08 UTC
Thanks for pointing that out. The html docs on one machine must not have been updated properly, since it's Perl 5.8.2. (I noticed that Win32::API is missing from the index pane, even though the module is present, so this is the second anomoly today). On another machine, I see a different -C documentation.	[reply]
Re: Re: Re: Character Encoding and Windows Console woes by Anonymous Monk on Feb 17, 2004 at 03:06 UTC
In perlunicode: Unicode characters can also be added to a string by using the \x{...} notation. The Unicode code for the desired character, in hexadecimal, should be placed in the braces. For instance, a smiley face is \x{263A}. This encoding scheme only works for characters with a code of 0x100 or above. Something for backward compatiblity, I think.	[reply]
Re: Re: Re: Re: Character Encoding and Windows Console woes by John M. Dlugosz (Monsignor) on Feb 17, 2004 at 05:12 UTC
If at least one character in the string has a code of >= 0x100, then all characters >0x7F will be multi-byte encoded. If all the characters are less than 256, then it is also possible to encode the string with one byte per character. Some functions, like chr, make it a point to use the byte form when possible, since it's faster.	[reply]