Dear Monks,
I am no stranger to the intricacies of character set encodings and conversions between them, but I am still a bit confused after reading perluniintro, Encode, the open pragma, etc.. as to what one needs to do exactly in my case :
Imagine a Linux-based Apache2 webserver with DAV, and some directories available for Windows PC users to upload files. These users connect to these directories by means of the Windows-based "web folders" (DAV), and can thus drag-and-drop files in Windows Explorer from their local PC filesystem to the directories located on the server. The users are, for instance, Spanish, and upload files named, for instance, "Presentación.ppt" (notice the accent on the ó).
Ditto files land on the server, with filenames obviously utf-8 encoded.
On the other hand, these same server directories are "exported" by means of Samba, so that they are visible to a Perl script running on a separate Windows system (Perl v5.8.8). The script opens and reads these directories by means of
opendir(DIR,$dirpath); my @A = readdir DIR; closedir DIR; ... my $entry = shift $A; ...
And the question is : what kind of character encoding will the directory entry "Presentación.ppt" be in, and on what does it depend ?
(From my tests, it would seem that the entry is considered as 'bytes', but these bytes include the 2 bytes that represent the utf-8-encoded character "ó"; in other words, Perl reads the entry with all it's bytes correctly, but $entry does not have the "is_utf8" flag set).
A secondary wonder is that, when I do a
if (-f "$dirpath/$entry")
the result is false, and if I try to open() the file, it returns an error.
Additional note : files that have no "accented characters" in their names are seen and processed fine. Similarly, if I manually rename the server file to, for instance, "Presentacion.ppt", it is handled fine thereafter (meaning that the -f now returns true, and the open() works.
I would be grateful for any tip. André (with accent)

In reply to utf8 in directory and filenames by soliplaya

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.