http://qs1969.pair.com?node_id=11136763

BillKSmith has asked for the wisdom of the Perl Monks concerning the following question:

I have written the first draft of a module which adds an encoding layer to STDOUT so non-ascii characters print correctly in a CMD window (under Windows 7) Consider the following one-liner in a CMD window.
perl -e"print qq(\xe4)"

It incorrectly displalys a greek sigma.

Now with my module

perl -MDOS::Try -e"print qq(\xe4)"

It now displays the correct character

So far so good! Now I want to automate this test. I thought that I could run this script in backtics and capture the STDOUT.

use strict; use warnings; use $result; $result = `perl -MDOS::Try -e"print qq(\xe4)"`; print $result;

This script displays the greek sigma. The module does not work in this environment.

I need help finding either a better way to test this module or a way to rewrite it which avoids the problem. The 'guts' of the module (below) consists of three statements cut and pasted from exampmles in the documentation of open (with minor edits as necessary)

package DOS::Try; use strict; use warnings; open(my $oldout, ">&STDOUT") or die "Can't dup STDOUT: $!"; close STDOUT; open(STDOUT, ">&:encoding(Cp437)", $oldout) or die "Can't dup \$oldout +: $!"; 1
Bill

Replies are listed 'Best First'.
Re: print in CMD window
by pryrt (Abbot) on Sep 14, 2021 at 20:22 UTC
    You don't need to close the old STDOUT and create a new one with the CP437 encoding; you can just binmode the existing STDOUT:
    C:\Users\peter.jones\Downloads\TempData\perl>perl -Ilib -e"print qq(\xe4)"
    Σ
    C:\Users\peter.jones\Downloads\TempData\perl>perl -Ilib -MDOS::Try -e"print qq(\xe4)"
    ä
    
    vs
    
    C:\Users\peter.jones\Downloads\TempData\perl>perl -e"print qq(\xe4)"
    Σ
    C:\Users\peter.jones\Downloads\TempData\perl>perl -e"binmode STDOUT, ':encoding(Cp437)'; print qq(\xe4)"
    ä
    

    And the reason your example doesn't work in your test is the same reason that you wrote the module: you need to have the right encoding on the output of your test script as well as the code of the `...`.

    #!perl
    
    use strict;
    use warnings;
    my $result = `perl -Ilib -MDOS::Try -e"print qq(\xe4)"`;
    print "first test: $result\n";
    
    use lib 'lib';
    require DOS::Try;
    print "second test: $result\n";
    __END__
    first test: Σ
    second test: ä
    

    You can see more if you hex dump the bytes being output from the two variants of the oneliner:

    C:\Users\peter.jones\Downloads\TempData\perl>perl -e"print qq(\xe4)" | perl -e "print unpack 'H*',$_ for <>"
    e4
    C:\Users\peter.jones\Downloads\TempData\perl>perl -e"binmode STDOUT, ':encoding(Cp437)'; print qq(\xe4)" | perl -e "print unpack 'H*',$_ for <>"
    84
    
Re: print in CMD window
by LanX (Saint) on Sep 14, 2021 at 23:50 UTC
    > It incorrectly displalys a greek sigma.

    not for me on Win10, it's an "õ"

    and this dependends on the codepage and font configured for the CMD.

    my properties tell me that I have CP 850 (OEM Multiligual Latin 1) preset, and if I change font to raster font (or whatever the English translation is) I see a capital sigma.

    > It now displays the correct character

    which is???

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      not for me on Win10, it's an "õ"

      Same for me on my Windows 7 machine - cp850 for the cmd.exe console; cp1252 for text files.
      I still have an old Text::Iconv script (that I hadn't used for years) that converts between the 2.

      Cheers,
      Rob
Re: print in CMD window
by BillKSmith (Monsignor) on Sep 15, 2021 at 18:45 UTC
    Thanks to all for the help. Anonymous Monk's links suggest that I have an X-Y problem where the X has already been solved. This seems like the ideal solution, but so far I have not been able to make it work. I have a lot to learn about terminals etc.

    I had rejected the binmode idea because I expected it to remove the :crlf layer. I forgot that I should test all possible solutions for that.

    The observation that my test has the very flaw that I was trying to fix is interesting. Despite the error, it still demonstrates my problem. The fact that Result1 is a sigma proves that the shell script did not encode the \xe4. In fact Result2 masks the problem by encoding the character in the main program.

    Sorry for the confusion to other windows users. My solution is only intended for 'hobby' use on my own system. I do not expect to use it for anything except an occasional download from perlmonks.

    I probably will end up with the binmode solution. I am still trying to devise a suitable test. Most of my difficulty is unrelated to the problem (e.g. quotes and escapes).

    Bill
      Bill,

      The fact that Result1 is a sigma proves that the shell script did not encode the \xe4

      I disagree, because the `` has some implicit translations going on that you aren't controlling.

      To test it, you need to capture the raw output bytes of the -e under test. Then you can compare them to the expected values.

      I show an example test where I print \xe4\xe0 twice: the first time, without a binmode in the oneliner; the second time, with a binmode in the oneliner. You can see that the bytes that are output are different. You can test that those bytes match your expectations.

      C:\Users\peter.jones\Downloads\TempData\perl>chcp
      Active code page: 437
      C:\Users\peter.jones\Downloads\TempData\perl>perl pm.pl
      
      __SOURCE__
      #!perl
      
      use 5.012; # strict, //
      use warnings;
      use IPC::Open2;
      use Test::More;
      
      undef $\;
      print "\n__SOURCE__\n";
      seek \*DATA, 0, 0;
      print for <DATA>;
      
      $\ = "\n";
      {
          my $pid = open2(my $ofh, my $ifh, 'perl', '-e', q("print qq(\xe4\xe0)"));
          binmode $ofh, ':raw';   # need to read from the open2 output file handle in raw mode, so you're looking at bytes, _not_ characters
          chomp(my $line = <$ofh>);
          print "without binmode, the high 8-bit characters pass through untranslated: ", unpack 'H*', $line;
          is $line, "\xE4\xE0", 'the bytes should be unedited';
          print "and printed out during test script: '$line'";
      }
      {
          my $pid = open2(my $ofh, my $ifh, 'perl', '-e', q("binmode STDOUT, ':encoding(Cp437)'; print qq(\xe4\xe0)"));
          binmode $ofh, ':raw';   # need to read from the open2 output file handle in raw mode, so you're looking at bytes, _not_ characters
          chomp(my $line = <$ofh>);
          print "with binmode, xE4 gets translated to x84 for a-umlaut, and xE0 gets translated to x85 for a-grave: ", unpack 'H*', $line;
          is $line, "\x84\x85", 'the bytes should be CP437-encoded';
          # so here, instead of printing the hexdump of the captured line, you could compare
          print "and printed out during test script: '$line'";
      }
      done_testing();
      
      __END__
      __OUTPUT__
      without binmode, the high 8-bit characters pass through untranslated: e4e0
      ok 1 - the bytes should be unedited
      and printed out during test script: 'Σα'
      with binmode, xE4 gets translated to x84 for a-umlaut, and xE0 gets translated to x85 for a-grave: 8485
      ok 2 - the bytes should be CP437-encoded
      and printed out during test script: 'äà'
      1..2
      

        Thank you pryrt. Your second example is exactly what I asked for in my first post. I have already extended it to test newlines by appending \n to the input string and \r\n to the expected output string (and removing chomp). My project started as a minor annoyance and ended with a one-line solution. I never dreamed that in between, I would need to learn details of windows (including a reference to an old DOS manual to get started), Unicode, and even perl (I have never used a child process).
        Bill
Re: print in CMD window
by Anonymous Monk on Sep 15, 2021 at 08:49 UTC
A reply falls below the community's threshold of quality. You may see it by logging in.