Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Re^2: Perl, DOS and encodings

by siberia-man (Friar)
on Apr 29, 2020 at 20:02 UTC ( [id://11116242]=note: print w/replies, xml ) Need Help??


in reply to Re: Perl, DOS and encodings
in thread Perl, DOS and encodings

Thank you for your response. I've just tested the suggestion from the superuser.com answer. To be honest, without your explanation that answer doesn't give much clues. Simply compare it.

As I have already said, the code page defaults to 866 (or IBM CP866, the old code page since MSDOS 4.01). BodyName = koi8-r is another code page 20866. How does it indeed work -- I don't know, cmd.exe is definitely painful.
C:\>chcp Active code page: 866 C:\>powershell -c "[System.Text.Encoding]::Default" IsSingleByte : True BodyName : koi8-r EncodingName : Cyrillic (Windows) HeaderName : windows-1251 WebName : windows-1251 WindowsCodePage : 1251 IsBrowserDisplay : True IsBrowserSave : True IsMailNewsDisplay : True IsMailNewsSave : True EncoderFallback : System.Text.InternalEncoderBestFitFallback DecoderFallback : System.Text.InternalDecoderBestFitFallback IsReadOnly : True CodePage : 1251
I tested the command from my opening post with different codes pages, setting it to 1251 or 65001 (utf-8). The only correct encoding for Cyrillic text in CLI is 1251. The default encoding in Cygwin is en_US.UTF-8.

Updated:

I tested the script invoking it from the shell/batch script. It works correctly, if the title's encoding corresponds the encoding of the shell script. The code page 1251 only has to be specified in the batch script, independently of the encoding of the batch script itself.

Replies are listed 'Best First'.
Re^3: Perl, DOS and encodings
by haj (Vicar) on Apr 29, 2020 at 22:10 UTC

    The relevant information is the WindowsCodePage entry. This is the encoding which is used by cmd.exe to pass cyrillic characters from your terminal input to your Perl program, and you can not change it using chcp.

    According to my experiments, which may be totally bogus, things get even more interesting if you write your command, including command line parameters with cyrillic characters, into a .bat file and execute that. In that case, the chcp setting will be used to decode the batch file - but still the Perl program will receive its @ARGV in the WindowsCodePage encoding.

    So, if your batch file is UTF-8 encoded, you need to chcp 65001 and use --title-transcode=cp1251 if you pass the title as a command line parameter.

      Oh, yes. I tested it, but my experiments were not completed. At least in pure DOS I have to specify cp1251 as encoding for parameters in the command line. It doesn't depend on the file encoding. If I run the same script under ConEmu, I have to specify the same encoding as the file itself.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://11116242]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 14:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found