I'm portuguese and like so many people that live in countries with latin languages (portuguese, spanish, french, italian, etc) I have to deal with accent file names. Other non-latin languages have the same problem for sure (german, dutch, etc). The context here is Windows using NTFS drives, using Unicode to set up the files names. I'm using the latest perl version, that supports Unicode.

For example, I have a directory/folder in "c:\users\someuser\documents" named "documentação" ("documentation" in english). The full path will be "c:\users\someuser\documents\documentação". Now, if I do this:

use strict; our $dp; $dp = "c:\\Users\\someuser\\Documents\\documentação"; if (-d $dp) { print "ok\n"; } else { print "nope\n"; }

It will return "nope"...
If I change the text to "documenta\x{00E7}\x{00E3}o", it returns "ok"...
Printing the string variable will show the same thing...
If I use opendir/readdir in the "c:\users\someuser\documents" directory it will read "documentação" perfectly and -d will work fine...
The -d simply does not work with the direct text on the string variable...
If I add code to set the variable using command line argument in a dos console it will return "ok" also.

I wasted hours reading unicode and perl documentation, and trying diferent methods (utf8, encoding, deconding, locale, etc) for correcting this, but nothing works. It is a problem with the way perl codifies the string internaly. I suppose that using some sort of perl command line option would do some thing that could solve the issue but this is not the way to resolve this.

(post edited meanwhile, the solution I have found did not work)

Unicode is a wonderful thing but reading about the evolution of Unicode you start thinking that Unicode is now on the same level of confusion to what happened to the ancient codepages... I hope that some one teachs me a lesson, or this sort of weirdness can be solved in future versions of perl.

Thank you / Obrigado.


In reply to Accent file names issue by ruimelo73

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.