Determine encoding of STDOUT

Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise monks,

I have a script which prints a character to STDOUT. Unfortunately I do not know the encoding of STDOUT.

Here I have my emacs editor. Its compilation window is encoded in UTF8. So if I execute this script within emacs the encoding of STDOUT should be in UTF8 to display the character correctly.

On the other hand I have here the windows commandline (cmd). The encoding there is 'cp437'. So the STDOUT should be encoded in 'cp437'

If I set the encoding of STDOUT to UTF8 then it is displayed correctly in the compilation window of emacs, but it is displayed wrong in the command line of windows. If I set the encoding of STDOUT to 'cp437' then it is correct in the windows command line but wrong in the emacs compilation window.

Here my code:

#!/usr/bin/perl

use strict;
use warnings;

use charnames ':full';
use Encode;

# TODO: determine encoding of STDOUT
my $enc_of_stdout;

# compilation window of emacs is encoded in UTF8
$enc_of_stdout = 'utf8';

# cmd window in Windows XP is encoded in CP437
#$enc_of_stdout = 'cp437';

binmode(STDOUT,":encoding($enc_of_stdout)");

# same as: "\x{f2}"
my $text_str = "\N{LATIN SMALL LETTER O WITH GRAVE}";

print "$text_str\n";
[download]

My goal is that this script is working independent of the encoding of STDOUT. So the script should be able to find out the encoding of STDOUT at the beginning.

How can I find out what the encoding of STDOUT of the caller of the script is?

Or am I thinking to difficult and there is an easier way?

Thank you for your help

Dirk

Comment on Determine encoding of STDOUT Download Code

Replies are listed 'Best First'.
Re: Determine encoding of STDOUT by moritz (Cardinal) on May 04, 2011 at 13:06 UTC
If the character encoding can be reliably extracted from the locale, open helps: `use open IO => ':locale';` [download] But as tchrist said, you'll be happier with all-UTF-8 environments in the long run. Perl 6 - second systems done right	[reply] [d/l]
Re^2: Determine encoding of STDOUT by CountZero (Bishop) on May 04, 2011 at 18:49 UTC
When I run the following program on my Windows XP Pro box, it reports French_Belgium.1252: `use Modern::Perl; use POSIX qw(locale_h); my $old_locale = setlocale(LC_CTYPE); say $old_locale;` [download] CountZero A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James	[reply] [d/l]
Re^2: Determine encoding of STDOUT by Dirk80 (Pilgrim) on May 04, 2011 at 13:47 UTC
It seems that the character encoding cannot be extracted from locale because I get the following error message: `Cannot figure out an encoding to use at test.pl line 9` [download]	[reply] [d/l]
Re^3: Determine encoding of STDOUT by moritz (Cardinal) on May 04, 2011 at 15:55 UTC
So you need some other mechanism for signaling encoding. For example have the user provide some environment variables if the encoding is different than UTF-8. Perl 6 - second systems done right	[reply]
Re^2: Determine encoding of STDOUT by ikegami (Patriarch) on May 04, 2011 at 17:03 UTC
Doesn't work. One would have to call WinAPI function `GetACP`	[reply] [d/l]
Re: Determine encoding of STDOUT by tchrist (Pilgrim) on May 04, 2011 at 12:50 UTC
It really is much easier if you simply arrange that all consumers of your output expect UTF‑8, not some legacy encoding. For example, on a Mac or any other modern Unix system, I always set my window encoding to use UTF‑8. Setting it to use something like crufty old MacRoman would just set me up for a world of pain. No thanks!	[reply]
Re^2: Determine encoding of STDOUT by Dirk80 (Pilgrim) on May 04, 2011 at 13:59 UTC
I think you are right. The best way is to write in the help of the script that its STDOUT is encoded as UTF8. I looked around how to do change the encoding of "cmd.exe". I could do it temporarily as follows: change font to "Lucida Console" enter `chcp 65001` to change the encoding to UTF8 But the change of the new codepage 65001 is NOT stored permanently. If I open a new "cmd.exe" its codepage is 437 again. A bit offtopic because this question is windows specific and not perl: Do you know how to change the codepage permanently to 65001 in Windows XP? Would you recommend to write the following code at the beginning of the script: `chcp 65001`; So I would assure that the encoding is UTF8. Of course I should find out the active codepage (e.g. 437) at the beginning of the script and then restore it at the end (perhaps in an END block). And I should also do it dependent on the OS. This for example would be a windows specific solution.	[reply] [d/l] [select]
Re^3: Determine encoding of STDOUT by Jenda (Abbot) on May 04, 2011 at 15:46 UTC
You might change the shortcut to Command Prompt to run the chcp automatically. For example my shortcut contains `%windir%\system32\cmd.exe /F:ON /k doskey /macrofile="%USERPROFILE%\doskey.mac"` The /k means "run this and stay open". You can also set a hotkey for the program and start it by , say, CTRL+SHIFT+`. Jenda Enoch was right! Enjoy the last years of Rome.	[reply] [d/l]
Re^2: Determine encoding of STDOUT by ikegami (Patriarch) on May 04, 2011 at 17:00 UTC
The UTF-8 support for the Windows console is full of problems, although I wonder if some go away if a cygwin bash shell is used.	[reply]
Re^3: Determine encoding of STDOUT by tchrist (Pilgrim) on May 04, 2011 at 17:45 UTC
If I want a portable program, I’ll generate Unicode output. If I want a non-portable program, I’ll generate output in some non-portable, legacy vendor encoding. I never ever do the second. It’s a shame that Microsoft is still lagging behind on proper Unicode support, but that is hardly Perl’s fault. Perl makes it easy to write portable programs, and bending over backwards to accomodate Microsoft-only idio(t)syncrasies seems like a self-limiting and very niche environment.	[reply]
Re^4: Determine encoding of STDOUT by ikegami (Patriarch) on May 04, 2011 at 18:44 UTC
Re^4: Determine encoding of STDOUT by Anonymous Monk on May 04, 2011 at 18:28 UTC
Re: Determine encoding of STDOUT by Anonymous Monk on May 04, 2011 at 12:53 UTC
They call that locale, see perllocale	[reply]