Outputting Unicode to DOS

thekestrel has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Outputting Unicode to DOS by BrowserUk (Patriarch) on Aug 27, 2015 at 15:05 UTC
In all likelyhood, all you need to do is change the codepage of the CLI session. For example, the default codepage on my system is 850, and if I print your test string: `#! perl -slw use strict; my $str = "abc123äöüß"; print $str;` [download] this is what I get: C:\test>chcp Active code page: 850 C:\test>1140216.pl abc123õ÷³▀ But if I change the codepage to the Windows Unicode codepage 65000, I get this: C:\test>chcp 65000 Active code page: 65000 C:\test>1140216.pl abc123äöüß And if you need to automate the change of codepage, use: `use Win32::Console; ... Win32::Console::OutputCP( 65000 ); ...` [download] BTW: The above was done using perl 5.10; and still works as is with more modern versions: `C:\test>\perl5.18\perl\bin\perl.exe 1140216.pl abc123äöüß C:\test>\Perl5.20\bin\perl.exe 1140216.pl abc123äöüß C:\test>\Perl22\bin\perl.exe 1140216.pl abc123äöüß` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply] [d/l] [select]
Re^2: Outputting Unicode to DOS by Carbonblack (Novice) on Sep 06, 2019 at 08:52 UTC
Although this thread is quite old now, I would like to share my humble insights here... When I'm dealing with different encodings, I get the best results when I decode all inputs and encode all outputs. Text constants in the script also count as input and have to be decoded too. If the perl script is stored in e.g. cp850, then it's worth it to write `my $text = decode("cp850","Text mit Umlauten äöüß ÄÖÜ");` You can omit the decode(..) if you text only contains standard ASCII characters. ( Maybe it is wise to decode even then to have all flags set correctly. )	[reply] [d/l]
Re: Outputting Unicode to DOS by aitap (Curate) on Aug 27, 2015 at 16:09 UTC
Please don't use encoding: it's deprecated. Instead, either don't recode anything (and store the file in the same character encoding as your terminal uses, likely cp850, check the output of `chcp` command), or do store your program in UTF-8 and use utf8 (thus your text is stored as characters and you are able to perform unicode-related string operations) and encode the strings you print back to bytes (the characters have to be stored in some encoding, thus if you do not encode them, Perl warns and outputs latin1 or UTF-8), possibly with the help of Encode::Locale: `use utf8; use Encode 'encode'; my $str = "abc123äöüß"; print encode cp850 => $str;` [download] `use utf8; use Encode; # explicitly use for its binmodes use Encode::Locale; binmode STDOUT, ":encoding(console_out)"; my $str = "abc123äöüß"; print $str;` [download] (code is untested; you can also use `encode "locale", $unicode_string` and `binmode STDOUT, ":encoding(cp850)"`) See also: perlunitut	[reply] [d/l] [select]
Re^2: Outputting Unicode to DOS by thekestrel (Friar) on Aug 28, 2015 at 12:05 UTC
Perhaps I spoke too soon.. I thought if I fixed printing to the console then it would then be passed correctly to ImageMagick for processing... `use utf8; use Encode 'encode'; my $str = "convert -size 100x25 -background white -fill black -pointsi +ze 25 label:ÄÖÜß ÄÖÜß.gif\n"; print encode cp850 => $str; system encode cp850 => $str;` [download] The console (now) correctly outputs the string I want to the console... `convert -size 100x25 -background white -fill black -pointsize 25 label +:ÄÖÜß ÄÖÜß.gif` [download] (This command simply creates a new image file called ÄÖÜß.gif with the text ÄÖÜß written as text in the image) ... however the file it create is named wrong and the content is wrong (both Ž™šá). If I type that command exactly on the command line it works. I can only assume that I'm getting the encoding confused as it passes from perl through the shell into ImageMagick? I tried open a file handle with the mode set to "\|-" to write directly to ImageMagick skipping the shell, but no dice. If you could impart some more wisdom, it would be appreciated. Paul.	[reply] [d/l] [select]
Re^3: Outputting Unicode to DOS by aitap (Curate) on Aug 29, 2015 at 13:49 UTC
(This command simply creates a new image file called ÄÖÜß.gif with the text ÄÖÜß written as text in the image) ... however the file it create is named wrong and the content is wrong (both Ž™šá). Ž™šá is exactly what happens if ÄÖÜß is encoded to cp850 ("OEM" encoding on Windows, used in console) and then wrongly decoded as cp1252 ("ANSI" encoding on Windows, used in ANSI versions of WinAPI). I think that for system you'll need Encode::Locale's "locale" encoding, as opposed to "console_out": while text for console input/output should be encoded to OEM encoding (CP850 on a German system, CP866 on Russian one), file names and commands for system should be encoded in ANSI character set (CP1252 on a German system, CP1251 on Russian, etc.). You can also try to search for Unicode-related WinAPI wrappers for Perl (Does Win32::Unicode work? Is Win32::Process Unicode-aware?), but I can't give any advice on them. In conclusion, try: `use utf8; use Encode 'encode'; my $str = "convert -size 100x25 -background white -fill black -pointsi +ze 25 label:ÄÖÜß ÄÖÜß.gif\n"; print encode cp850 => $str; system encode cp1252 => $str;` [download] (or: `use Encode::Locale; print encode console_out => ...; system encode locale => ...;` so your program is portable across different locales in Windows, though finding a way to use W-suffixed WinAPI functions would be better) Since you are using ImageMagick, you can try its Perl binding, although getting Strawberry Perl to build it was not an easy task last time I tried it.	[reply] [d/l] [select]
Re^3: Outputting Unicode to DOS by BrowserUk (Patriarch) on Aug 28, 2015 at 12:35 UTC
If you could impart some more wisdom, it would be appreciated. It seems fairly likely that convert.exe isn't unicode-enabled. If so, there is nothing you can do about that from the outside. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :) In the absence of evidence, opinion is indistinguishable from prejudice. I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!	[reply]
Re^2: Outputting Unicode to DOS by thekestrel (Friar) on Aug 28, 2015 at 06:39 UTC
Thanks for the reply. Both snippets work like a charm =). Much appreciated.	[reply]
Re: Outputting Unicode to DOS by 1nickt (Canon) on Aug 27, 2015 at 14:49 UTC
What output do you get? On Perl 5.22 under Darwin (Mac OSX) your code produces: `Use of the encoding pragma is deprecated at 1140209.pl line 8. abc123äöüß` [download] On the other hand this code: `#! perl my $str = "abc123äöüß"; print $str;` [download] outputs as follows: `$ perl 1140209.pl abc123äöüß` [download] The way forward always starts with a minimal test.	[reply] [d/l] [select]
Re^2: Outputting Unicode to DOS by thekestrel (Friar) on Aug 28, 2015 at 06:36 UTC
Hi, I was getting the following output: abc123├ñ├Â├╝├ƒ I was able to fix it with the other comments I read, I wasn't familiar with the codepage.	[reply]