Hello,

I was researching how it would be possible to receive results of readdir in utf8, and didn't find anything useful. The problem is that I want to read file names on win32 that contain non-latin character, that are mapped to '?' within my codepage.

I found that the problem was discussed before, but couldn't find any suitable solutions, jperl hacks being discontinued and Win32API::File not having FindFirst/FindNext entries.

I was thinking if it is indeed not possible, of introducing some switches in perl core that would trigger behavior of readdir between bytes and utf8. Next steps probably would be that open would recognize utf8 file names as well, but that's for later.

Another aspect is that the problem is wider than win32 - it is perfectly legal to create utf8 file names on unix file systems (of course one can always treat them as non-unicode names, which is not possible on win32); gnome utilities use this feature when run under UTF8 locales. The point is if someone a) explicitly knows that his files have utf8 names and b) wants them to be accessed with perl utf8 semantics and little hassle (and irrespective of the locale!), there's no way to do that except to mess with Encode.

So my questions are:
- Can (as of now) readdir return utf8 scalars?
- If not, is this a good idea to introduce such changes in core?
- If yes, what would be the most desirable format of the trigger? A new system var f.ex. $UTF8_FILENAMES or a new pragma like "use utf8 'filenames'" or "use utf8_filenames" or ...?

Thank you!


In reply to unicode version of readdir by dk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.