Hello dear Monks,

I came with the issue I have encountered recently.

I have the perl script git-md-toc helping me to generate Table of Content (TOC) from a markdown file and embed it into the original file.

It worked fine with the Latin charset. Later I found it doesn't work with other encodings. I extedned it to support other encodings by specifying a particular encoding via an additional command line option. It works fine as well (I tested it under Cygwin). However it fails under DOS session, if there is need to add a title of TOC to the file written with non-Latin charset/encoding.

For example, there is test file in UTF8 having some Cyrillic text. I need to update it adding TOC with the title in Russian.

This command in bash works fine (Perl 5.30 shipped with Cygwin):

git-md-toc -ut "some-text-in-russian" -Tutf8 "utf8-cyrillic.md"

But it fails in DOS sessions -- the title is being added in wrong encoding. To resolve the issue I have to use one more option (standalone StrawberryPerl 5.30):

git-md-toc --title-transcode=cp1251 -ut "some-text-in-russian" -Tutf8 +"utf8-cyrillic.md"

The thing confusing me is that the default DOS code page is 866 and the encoding for the title I have to specified is 1251.

My questions are:

  1. Is this something specific for DOS, Perl or combination of both?
  2. How does the script work in other systems (windows, linux especially with encodings unlike of mine)?

In reply to Perl, DOS and encodings by siberia-man

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.