#!perl
use strict;
use warnings;
use charnames ':full';
use Win32::Console;
my $c = Win32::Console->new(STD_OUTPUT_HANDLE);
$c->OutputCP(65001); # we write UTF-8
binmode STDOUT, ':encoding(UTF-8)';
print "\N{INFINITY}\n";
In my tests, the output code page persisted after the program run, so you might (or might not) want to save/restore the codepage:
my $oldCP = $c->OutputCP();
$c->OutputCP(65001); # we write UTF-8
END{
if( $c ) {
$c->OutputCP($oldCP); # we write UTF-8
}
}
...
| [reply] [d/l] [select] |
In my tests, the output code page persisted after the program run
Interesting, in my test on Win 10 Pro with Win32::SetConsoleOutputCP(65001), the codepage change didn't persist.
Update: Sorry, nevermind, see my reply below!
| [reply] [d/l] |
#!perl
use strict;
use warnings;
use charnames ':full';
use Win32::Console;
binmode STDOUT, ':encoding(UTF-8)';
print "\N{INFINITY}\n";
tmp2.pl
#!perl
use strict;
use warnings;
use charnames ':full';
use Win32;
Win32::SetConsoleOutputCP(65001);
binmode STDOUT, ':encoding(UTF-8)';
print "\N{INFINITY}\n";
And the console output:
C:>perl q:\tmp.pl
дъз
C:>perl q:\tmp2.pl
∞
C:>perl q:\tmp.pl
∞
C:>chcp
Aktive Codepage: 850.
I'd expect the code page not to persist, and the output of CHCP does indicate that, but the terminal output / interpretation of the programs does indicate that after the first change to UTF-8, the output of subsequent programs is also interpreted as UTF-8 ... | [reply] [d/l] [select] |
Then, when the chr(177) gets written to a text file, it displays as the desired plus-or-minus symbol when viewed in Windows notepad.
In this case it just so happens that the UTF-8 encoding of U+00B1 PLUS-MINUS SIGN is C2 B1 and I guess that the C compiler is doing the equivalent of unsigned char a = 0xC2B1 & 0xFF. It also just so happens that 0xB1 (177) is the character ± in CP1252, Latin-1, and others (which I guess is Notepad's interpretation), but in CP850 and CP437, 0xB1 is ▒. You'll probably not see this happening with ∞ U+221E INFINITY, whose UTF-8 encoding is E2 88 9E, but which is 0xEC in CP437, and which has no representation in the other three encodings I mentioned. Oh, the joys of single-byte encodings :-)
Using that codepage, this troublesome C library function (in mpc-1.3.x) by the name of "mpcr_out_str", then displays correctly when accessed from the Math::MPC module.
It would seem logical to me then that the library is outputting UTF-8.
Is there some way I can manipulate the active code page in perl (on windows) without shelling out to chcp ?
I should note I'm not an expert on this topic - but this works for me:
use warnings;
use strict;
use open qw/:std :encoding(UTF-8)/;
use Win32;
Win32::SetConsoleOutputCP(65001);
print "\N{U+B1}\N{U+221E}\n";
I'm thinking that, for Windows only, Math::MPC needs to change the codepage to 65001 before calling this function ... and then it ought also revert the codepage to its original setting immediately after the function has been run.
CP65001 is UTF-8, and IMHO UTF-8 is probably the most universal, so unless you've got some other funky Unicode stuff going on, I don't think you'd need to change it back, the boilerplate I showed above should be fine for the entire process - that and, according to the sources I found, making sure that your terminal is using a Unicode-capable font.
| [reply] [d/l] [select] |
| [reply] [d/l] [select] |
| [reply] [d/l] |
| [reply] |
Win32::Console::OutputCP( 65001 ); | [reply] |