thekestrel has asked for the wisdom of the Perl Monks concerning the following question:

Hi, It's been a while since I've used Perl / Perlmonks and most useful knowledge has long since eroded its way through my skull and dribbled onto the floor.
I'm using Active Perl (perl 5, version 20, subversion 2 (v5.20.2) built for MSWin32-x64-muli-thread) and want to be able to output some unicode characters (specifically äöüß) so that I can automate some Imagemagick'ing.
At the DOS prompt i can type äöüß just fine (German keyboard) and can create things manually.
I 'assumed' something like the following would print out the same as seen in quotes, but it doesn't, it is replaced with other ASCII art.
#!/usr/bin/perl use strict; use warnings; use encoding 'utf8'; my $str = "abc123äöüß"; print $str;
I read through...
http://perldoc.perl.org/perluniintro.html
http://www.perlmonks.org/?node_id=930785
http://perlmonks.org/?node_id=799088
..and have tried a number of the listed variants, but I'm missing something because nothing is working (I always get other characters).
Any hints at what I'm sure is a trivial problem would be appreciated.

Paul.

Replies are listed 'Best First'.
Re: Outputting Unicode to DOS
by BrowserUk (Patriarch) on Aug 27, 2015 at 15:05 UTC

    In all likelyhood, all you need to do is change the codepage of the CLI session.

    For example, the default codepage on my system is 850, and if I print your test string:

    #! perl -slw use strict; my $str = "abc123äöüß"; print $str;

    this is what I get:

    C:\test>chcp
    Active code page: 850
    
    C:\test>1140216.pl
    abc123õ÷³▀
    
    

    But if I change the codepage to the Windows Unicode codepage 65000, I get this:

    C:\test>chcp 65000
    Active code page: 65000
    
    C:\test>1140216.pl
    abc123äöüß
    

    And if you need to automate the change of codepage, use:

    use Win32::Console; ... Win32::Console::OutputCP( 65000 ); ...

    BTW: The above was done using perl 5.10; and still works as is with more modern versions:

    C:\test>\perl5.18\perl\bin\perl.exe 1140216.pl abc123äöüß C:\test>\Perl5.20\bin\perl.exe 1140216.pl abc123äöüß C:\test>\Perl22\bin\perl.exe 1140216.pl abc123äöüß

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority". I knew I was on the right track :)
    In the absence of evidence, opinion is indistinguishable from prejudice.
    I'm with torvalds on this Agile (and TDD) debunked I told'em LLVM was the way to go. But did they listen!

      Although this thread is quite old now, I would like to share my humble insights here...

      When I'm dealing with different encodings, I get the best results when I decode all inputs and encode all outputs.
      Text constants in the script also count as input and have to be decoded too.

      If the perl script is stored in e.g. cp850, then it's worth it to write

       my $text = decode("cp850","Text mit Umlauten äöüß ÄÖÜ");

      You can omit the decode(..) if you text only contains standard ASCII characters.
      ( Maybe it is wise to decode even then to have all flags set correctly. )

Re: Outputting Unicode to DOS
by aitap (Curate) on Aug 27, 2015 at 16:09 UTC

    Please don't use encoding: it's deprecated.

    Instead, either don't recode anything (and store the file in the same character encoding as your terminal uses, likely cp850, check the output of chcp command), or do store your program in UTF-8 and use utf8 (thus your text is stored as characters and you are able to perform unicode-related string operations) and encode the strings you print back to bytes (the characters have to be stored in some encoding, thus if you do not encode them, Perl warns and outputs latin1 or UTF-8), possibly with the help of Encode::Locale:

    use utf8; use Encode 'encode'; my $str = "abc123äöüß"; print encode cp850 => $str;
    use utf8; use Encode; # explicitly use for its binmodes use Encode::Locale; binmode STDOUT, ":encoding(console_out)"; my $str = "abc123äöüß"; print $str;
    (code is untested; you can also use encode "locale", $unicode_string and binmode STDOUT, ":encoding(cp850)")

    See also: perlunitut

      Perhaps I spoke too soon.. I thought if I fixed printing to the console then it would then be passed correctly to ImageMagick for processing...
      use utf8; use Encode 'encode'; my $str = "convert -size 100x25 -background white -fill black -pointsi +ze 25 label:ÄÖÜß ÄÖÜß.gif\n"; print encode cp850 => $str; system encode cp850 => $str;
      The console (now) correctly outputs the string I want to the console...
      convert -size 100x25 -background white -fill black -pointsize 25 label +:ÄÖÜß ÄÖÜß.gif

      (This command simply creates a new image file called ÄÖÜß.gif with the text ÄÖÜß written as text in the image)
      ... however the file it create is named wrong and the content is wrong (both Ž™šá). If I type that command exactly on the command line it works. I can only assume that I'm getting the encoding confused as it passes from perl through the shell into ImageMagick?

      I tried open a file handle with the mode set to "|-" to write directly to ImageMagick skipping the shell, but no dice. If you could impart some more wisdom, it would be appreciated.

      Paul.

        (This command simply creates a new image file called ÄÖÜß.gif with the text ÄÖÜß written as text in the image) ... however the file it create is named wrong and the content is wrong (both Ž™šá).
        Ž™šá is exactly what happens if ÄÖÜß is encoded to cp850 ("OEM" encoding on Windows, used in console) and then wrongly decoded as cp1252 ("ANSI" encoding on Windows, used in ANSI versions of WinAPI).

        I think that for system you'll need Encode::Locale's "locale" encoding, as opposed to "console_out": while text for console input/output should be encoded to OEM encoding (CP850 on a German system, CP866 on Russian one), file names and commands for system should be encoded in ANSI character set (CP1252 on a German system, CP1251 on Russian, etc.). You can also try to search for Unicode-related WinAPI wrappers for Perl (Does Win32::Unicode work? Is Win32::Process Unicode-aware?), but I can't give any advice on them.

        In conclusion, try:

        use utf8; use Encode 'encode'; my $str = "convert -size 100x25 -background white -fill black -pointsi +ze 25 label:ÄÖÜß ÄÖÜß.gif\n"; print encode cp850 => $str; system encode cp1252 => $str;
        (or: use Encode::Locale; print encode console_out => ...; system encode locale => ...; so your program is portable across different locales in Windows, though finding a way to use W-suffixed WinAPI functions would be better)

        Since you are using ImageMagick, you can try its Perl binding, although getting Strawberry Perl to build it was not an easy task last time I tried it.

      Thanks for the reply. Both snippets work like a charm =). Much appreciated.
Re: Outputting Unicode to DOS
by 1nickt (Canon) on Aug 27, 2015 at 14:49 UTC

    What output do you get? On Perl 5.22 under Darwin (Mac OSX) your code produces:

    Use of the encoding pragma is deprecated at 1140209.pl line 8. abc123äöüß

    On the other hand this code:

    #! perl my $str = "abc123äöüß"; print $str;
    outputs as follows:
    $ perl 1140209.pl abc123äöüß

    The way forward always starts with a minimal test.
      Hi, I was getting the following output: abc123├ñ├Â├╝├ƒ I was able to fix it with the other comments I read, I wasn't familiar with the codepage.