in reply to Re^2: Encoding problem with function in C library
in thread Encoding problem with function in C library

Then, when the chr(177) gets written to a text file, it displays as the desired plus-or-minus symbol when viewed in Windows notepad.

In this case it just so happens that the UTF-8 encoding of U+00B1 PLUS-MINUS SIGN is C2 B1 and I guess that the C compiler is doing the equivalent of unsigned char a = 0xC2B1 & 0xFF. It also just so happens that 0xB1 (177) is the character ± in CP1252, Latin-1, and others (which I guess is Notepad's interpretation), but in CP850 and CP437, 0xB1 is ▒. You'll probably not see this happening with ∞ U+221E INFINITY, whose UTF-8 encoding is E2 88 9E, but which is 0xEC in CP437, and which has no representation in the other three encodings I mentioned. Oh, the joys of single-byte encodings :-)

Using that codepage, this troublesome C library function (in mpc-1.3.x) by the name of "mpcr_out_str", then displays correctly when accessed from the Math::MPC module.

It would seem logical to me then that the library is outputting UTF-8.

Is there some way I can manipulate the active code page in perl (on windows) without shelling out to chcp ?

I should note I'm not an expert on this topic - but this works for me:

use warnings; use strict; use open qw/:std :encoding(UTF-8)/; use Win32; Win32::SetConsoleOutputCP(65001); print "\N{U+B1}\N{U+221E}\n";
I'm thinking that, for Windows only, Math::MPC needs to change the codepage to 65001 before calling this function ... and then it ought also revert the codepage to its original setting immediately after the function has been run.

CP65001 is UTF-8, and IMHO UTF-8 is probably the most universal, so unless you've got some other funky Unicode stuff going on, I don't think you'd need to change it back, the boilerplate I showed above should be fine for the entire process - that and, according to the sources I found, making sure that your terminal is using a Unicode-capable font.

Replies are listed 'Best First'.
Re^4: Encoding problem with function in C library
by syphilis (Archbishop) on Dec 23, 2022 at 00:09 UTC
    use warnings; use strict; use open qw/:std :encoding(UTF-8)/; use Win32; Win32::SetConsoleOutputCP(65001); print "\N{U+B1}\N{U+221E}\n";
    That works nicely on Windows 10 and 11. But not on Windows 7, where I find that altering the codepage ostensibly succeeds, but in reality takes no effect.
    Perhaps the explanation for that might be found in one of AM's links.
    Anyway, I can probably ignore this issue with Windows 7 and earlier. It's unlikely that anyone other than me would ever hit it.

    ... so unless you've got some other funky Unicode stuff going on, I don't think you'd need to change it back

    Yes, I think so. It seems that Win32::SetConsoleOutputCP(65001) sets the codepage for the duration of the program and that should generally be fine, whereas chcp 65001 sets it for the duration of the cmd.exe console (and that's not so acceptable).

    Thanks again for the pointers, guys !!

    Cheers,
    Rob
      It seems that Win32::SetConsoleOutputCP(65001) sets the codepage for the duration of the program

      As Corion pointed out, that's unfortunately not the case, the change does persist and you'll have to do something in an END block like he showed.

        As Corion pointed out, that's unfortunately not the case, the change does persist and you'll have to do something in an END block like he showed.

        Oh ... ok, I'll deal with that. Thanks for the heads up.

        Cheers,
        Rob
      Anyway, I can probably ignore this issue with Windows 7 and earlier

      7 is already long past standard EoL and goes full EoL on the 10th of January (ie, less than 3 weeks from now) so yes it should definitely just be ignored.


      🦛

        7 is already long past standard EoL and goes full EoL on the 10th of January

        Like many of those who experienced Windows Vista, I still have a lot of love for Windows 7 ... for the simple reason that it wasn't Windows Vista ;-)

        Cheers,
        Rob