G'day John,

Welcome to the Monastery.

"Maybe I've overlooked the obvious (if so I apologise)."

Showing us your "simple perl program" and describing your actual problem with example input, output, error messages, and so on, would probably result in a better answer. As it is, we need to fall back to guesswork. I appreciate this is your first post, and I'm not trying to beat you over the head with the rule book, but please read "How do I post a question effectively?" and "Short, Self-Contained, Correct Example" to find out what sort of information to post in order to get the best answers.

One minor deviation from the information given in that first link: use '<pre>' blocks, instead of '<code>' blocks, for presenting Unicode data outside the 7-bit ASCII range. With '<code>' blocks, your Unicode characters will typically end up being shown as entity references, i.e. something like '&#NNNNNN;'. This won't happen with '<pre>' blocks; however, the drawbacks are there's no "[download]" link, and you have to manually change special characters in your code and data (e.g. '<' and '&') to their entities (e.g. '&lt;' and '&amp;') — there's a list of these after the textarea where you write your post. For inline Unicode characters, e.g. inside a '<p>' or '<li>' block, I typically use '<tt>' instead of '<pre>': this is to avoid '<pre>' being forced into a block format by, for instance, a style sheet.

Other than the actual tree walking, which could be part of your problem, the following script ("pm_1208191_read_utf8_filenames.pl") performs the reading and writing tasks you specify.

#!/usr/bin/env perl -l use strict; use warnings; use autodie; my $dir = 'pm_1208191_utf8_filenames'; my $out = 'pm_1208191_utf8_filenames_listing.txt'; open my $fh, '>', $out; opendir(my $dh, $dir); print $fh $_ while readdir $dh;

Given this test directory I set up:

$ ls -al pm_1208191_utf8_filenames
total 0
drwxr-xr-x  7 ken staff 238 Feb  1 11:45 .
drwxr-xr-x 18 ken staff 612 Feb  1 11:34 ..
-rw-r--r--  1 ken staff   0 Feb  1 11:34 abc
-rw-r--r--  1 ken staff   0 Feb  1 11:36 åßç
-rw-r--r--  1 ken staff   0 Feb  1 11:38 αβγ
-rw-r--r--  1 ken staff   0 Feb  1 11:41 абг
-rw-r--r--  1 ken staff   0 Feb  1 11:45 ☿♃♄

Here's a sample run:

$ cat pm_1208191_utf8_filenames_listing.txt
cat: pm_1208191_utf8_filenames_listing.txt: No such file or directory
$ pm_1208191_read_utf8_filenames.pl
$ cat pm_1208191_utf8_filenames_listing.txt
.
..
abc
åßç
αβγ
абг
☿♃♄

As you can see, I didn't need any special encoding-type directives. I'm using Perl 5.26.0; MacOS 10.12.5; and I have 'LANG=en_AU.UTF-8' (normal setting).

In case you can't actually see some of those characters, here's a table of the filenames, the three codepoints used for each, and a link to the Unicode PDF code chart so you can see what they look like.

FilenameCodepointsCode Chart (PDF link)
abcU+0061, U+0062, U+0063C0 Controls and Basic Latin
åßçU+00E5, U+00DF, U+00E7C1 Controls and Latin-1 Supplement
αβγU+03B1, U+03B2, U+03B3Greek and Coptic
абгU+0430, U+0431, U+0433Cyrillic
☿♃♄U+263F, U+2643, U+2644Miscellaneous Symbols

Take a look at "Re: printing Unicode works for some characters but not all", which I wrote some months ago. This may shed some light on whatever problems you're encountering — clearly, this is one of those guesswork answers I mentioned earlier.

The open pragma statement you're looking for might be something like:

use open IO => qw{:encoding(UTF-8) :std};

Again, that's more guesswork as you haven't shown your script or adequately described your problem.

— Ken


In reply to Re: UTF-8 and readdir, etc. by kcott
in thread UTF-8 and readdir, etc. by jrw005

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.