Yeah, that's what I was talking about. "Since 2019" wow, time flies. I would have guessed 2021.

The deeper problem is that Perl doesn't know what character encoding *any* scalar is using and can't make fully intelligent choices about when to encode as UTF-8 or when to downgrade. This corresponds to the more general Unix problem of never knowing whether a user wants their Unix byte-oriented paths and environment to have UTF-8 or some other encoding. You can *kind of* guess based on whether their $ENV{LC_ALL} =~ /utf-8/ but there doesn't seem to really be any official "all things in my system should be unicode" setting.

Windows (NT onward at least) has always had an understanding of which codepage it was operating under, and official ways to exchange unicode outside of that codepage. Perl doesn't have any way to generically tap into this knowledge without a matching understanding on Unix (or IBM AS-4000 or VMS or all the other places where perl might run) So... you're just stuck always manually preparing the correct encoding of filenames on your own. It takes an unfortunate amount of education for people to get it right, though.

I also wrote up a Meditation about unicode filenames in general.


In reply to Re^3: utf8::downgrade() and file system operators by NERDVANA
in thread utf8::downgrade() and file system operators by hexcoder

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.