Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

I'm portuguese and like so many people that live in countries with latin languages (portuguese, spanish, french, italian, etc) I have to deal with accent file names. Other non-latin languages have the same problem for sure (german, dutch, etc). The context here is Windows using NTFS drives, using Unicode to set up the files names. I'm using the latest perl version, that supports Unicode.

For example, I have a directory/folder in "c:\users\someuser\documents" named "documentação" ("documentation" in english). The full path will be "c:\users\someuser\documents\documentação". Now, if I do this:

use strict; our $dp; $dp = "c:\\Users\\someuser\\Documents\\documentação"; if (-d $dp) { print "ok\n"; } else { print "nope\n"; }

It will return "nope"...
If I change the text to "documenta\x{00E7}\x{00E3}o", it returns "ok"...
Printing the string variable will show the same thing...
If I use opendir/readdir in the "c:\users\someuser\documents" directory it will read "documentação" perfectly and -d will work fine...
The -d simply does not work with the direct text on the string variable...
If I add code to set the variable using command line argument in a dos console it will return "ok" also.

I wasted hours reading unicode and perl documentation, and trying diferent methods (utf8, encoding, deconding, locale, etc) for correcting this, but nothing works. It is a problem with the way perl codifies the string internaly. I suppose that using some sort of perl command line option would do some thing that could solve the issue but this is not the way to resolve this.

(post edited meanwhile, the solution I have found did not work)

Unicode is a wonderful thing but reading about the evolution of Unicode you start thinking that Unicode is now on the same level of confusion to what happened to the ancient codepages... I hope that some one teachs me a lesson, or this sort of weirdness can be solved in future versions of perl.

Thank you / Obrigado.


In reply to Accent file names issue by ruimelo73

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others musing on the Monastery: (2)
As of 2024-04-20 04:48 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found