Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks,

I have a script which prints a character to STDOUT. Unfortunately I do not know the encoding of STDOUT.

Here I have my emacs editor. Its compilation window is encoded in UTF8. So if I execute this script within emacs the encoding of STDOUT should be in UTF8 to display the character correctly.

On the other hand I have here the windows commandline (cmd). The encoding there is 'cp437'. So the STDOUT should be encoded in 'cp437'

If I set the encoding of STDOUT to UTF8 then it is displayed correctly in the compilation window of emacs, but it is displayed wrong in the command line of windows. If I set the encoding of STDOUT to 'cp437' then it is correct in the windows command line but wrong in the emacs compilation window.

Here my code:

#!/usr/bin/perl use strict; use warnings; use charnames ':full'; use Encode; # TODO: determine encoding of STDOUT my $enc_of_stdout; # compilation window of emacs is encoded in UTF8 $enc_of_stdout = 'utf8'; # cmd window in Windows XP is encoded in CP437 #$enc_of_stdout = 'cp437'; binmode(STDOUT,":encoding($enc_of_stdout)"); # same as: "\x{f2}" my $text_str = "\N{LATIN SMALL LETTER O WITH GRAVE}"; print "$text_str\n";

My goal is that this script is working independent of the encoding of STDOUT. So the script should be able to find out the encoding of STDOUT at the beginning.

How can I find out what the encoding of STDOUT of the caller of the script is?

Or am I thinking to difficult and there is an easier way?

Thank you for your help

Dirk

Replies are listed 'Best First'.
Re: Determine encoding of STDOUT
by moritz (Cardinal) on May 04, 2011 at 13:06 UTC
      When I run the following program on my Windows XP Pro box, it reports French_Belgium.1252:
      use Modern::Perl; use POSIX qw(locale_h); my $old_locale = setlocale(LC_CTYPE); say $old_locale;

      CountZero

      A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

      It seems that the character encoding cannot be extracted from locale because I get the following error message:

      Cannot figure out an encoding to use at test.pl line 9

        So you need some other mechanism for signaling encoding. For example have the user provide some environment variables if the encoding is different than UTF-8.

      Doesn't work. One would have to call WinAPI function GetACP
Re: Determine encoding of STDOUT
by tchrist (Pilgrim) on May 04, 2011 at 12:50 UTC
    It really is much easier if you simply arrange that all consumers of your output expect UTF‑8, not some legacy encoding.

    For example, on a Mac or any other modern Unix system, I always set my window encoding to use UTF‑8. Setting it to use something like crufty old MacRoman would just set me up for a world of pain.

    No thanks!

      I think you are right. The best way is to write in the help of the script that its STDOUT is encoded as UTF8.

      I looked around how to do change the encoding of "cmd.exe".

      I could do it temporarily as follows:

      • change font to "Lucida Console"
      • enter chcp 65001 to change the encoding to UTF8

      But the change of the new codepage 65001 is NOT stored permanently. If I open a new "cmd.exe" its codepage is 437 again.

      A bit offtopic because this question is windows specific and not perl: Do you know how to change the codepage permanently to 65001 in Windows XP?

      Would you recommend to write the following code at the beginning of the script:

      `chcp 65001`;

      So I would assure that the encoding is UTF8. Of course I should find out the active codepage (e.g. 437) at the beginning of the script and then restore it at the end (perhaps in an END block). And I should also do it dependent on the OS. This for example would be a windows specific solution.

        You might change the shortcut to Command Prompt to run the chcp automatically. For example my shortcut contains %windir%\system32\cmd.exe /F:ON /k doskey /macrofile="%USERPROFILE%\doskey.mac"

        The /k means "run this and stay open".

        You can also set a hotkey for the program and start it by , say, CTRL+SHIFT+`.

        Jenda
        Enoch was right!
        Enjoy the last years of Rome.

      The UTF-8 support for the Windows console is full of problems, although I wonder if some go away if a cygwin bash shell is used.
        If I want a portable program, I’ll generate Unicode output.

        If I want a non-portable program, I’ll generate output in some non-portable, legacy vendor encoding.

        I never ever do the second.

        It’s a shame that Microsoft is still lagging behind on proper Unicode support, but that is hardly Perl’s fault. Perl makes it easy to write portable programs, and bending over backwards to accomodate Microsoft-only idio(t)syncrasies seems like a self-limiting and very niche environment.

Re: Determine encoding of STDOUT
by Anonymous Monk on May 04, 2011 at 12:53 UTC