Please read perlunitut and perluniadvice.

Filenames live outside Perl, so you need to decode and encode them explicitly (and then hope the bytes are exactly how the file is stored) every time. In other words: for filenames, use byte strings, not unicode text strings.

A filename that you get from readdir or glob is already a properly encoded byte string. You can use it to open a file, without decoding or encoding the string.

And the question is : what kind of character encoding will the directory entry "Presentación.ppt" be in, and on what does it depend?

The character encoding will depend on the filesystem, and encoding layers used by the implementation of the filesystem, if any. In any case, you cannot be sure in a platform independent way. (Yes, that sucks.)

Perl reads the entry with all it's bytes correctly, but $entry does not have the "is_utf8" flag set).

That would be wrong. A filename is a binary string, not a text string. It consists of bytes, not characters. In order to use it as a text string, you have to decode it first. But it can be very hard to find out HOW to decode it, and certainly perl can't figure it out for you.

it returns an error.

Which error?

Juerd # { site => 'juerd.nl', do_not_use => 'spamtrap', perl6_server => 'feather' }


In reply to Re: utf8 in directory and filenames by Juerd
in thread utf8 in directory and filenames by soliplaya

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.