So which characters I can expect to be printed equal on different platform? Only 32..127 ? what about 128..255 ones?

You need to provide what the terminal expects. You can probably rely on the character set being based on ASCII, so you should be able to print ASCII's basic whitespace characaters (9, 10, 13, 32) and its non-whitespace printable characters (33..126) without problem. (127 is a control character.)

If you want to print other characters, you will need to correctly encode your output.

If you expect to receive other characters, you will need to correctly decode your input.

You can do this using the following:

BEGIN { if ($^O eq 'Win32') { require Win32; my $cie = "cp" . Win32::GetConsoleCP(); my $coe = "cp" . Win32::GetConsoleOutputCP(); my $ae = "cp" . Win32::GetACP(); binmode(STDIN, ":encoding($cie)"); binmode(STDOUT, ":encoding($coe)"); binmode(STDERR, ":encoding($coe)"); require open; "open"->import(":encoding($ae)"); require Encode; @ARGV = map { Encode::decode($ae, $_) } @ARGV; } else { require encoding; my $e = encoding::_get_locale_encoding() // 'UTF-8'; require open; "open"->import(':std', ":encoding($e)"); require Encode; @ARGV = map { Encode::decode($e, $_) } @ARGV; } }

Note: While UTF-8 is probably the only encoding you need to deal with on modern unix systems, you have to deal with 4 different encodings on Windows. System calls are made using one's choice of the system's "ANSI" interface (e.g. cp1252) or using the "Wide" (UTF-16le) interface. (Perl only uses the ANSI interface, though modules can still use either/both.) The ANSI code page is hardcoded for your version of Windows. The console uses a configurable encoding known as the OEM code page (e.g. cp437, cp850). The default OEM code page is based on your language settings. (For some reason, a console's input and output encoding can be different, but I have no idea how/why that would happen.) Finally, lots of data encountered is encoded using UTF-8. This brings up two "unanswerable" questions:

This means that printing to a file opened by Perl and printing to STDOUT redirected to a file will produce files encoded using different encoding, but it means that foo | find "bar" will produce readable output.

Note: On Windows, the arguments are always provided encoded using the system's Active (aka "ANSI") code page (e.g. 1252), not the console's (aka "OEM") code page (e.g. 473, 850, 65001), so only characters that exist both in the ANSI and OEM code page can be passed via arguments. So even if the console is using UTF-8, arguments are limited to using the machine's ANSI character set. This limit can be worked around by obtaining the command line using GetCommandLineW and re-parsing it.


In reply to Re: [OT] ASCII, cmd.exe, linux console, charset, code pages, fonts and other amenities by ikegami
in thread [OT] ASCII, cmd.exe, linux console, charset, code pages, fonts and other amenities by Discipulus

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.